Information processing method and information processing system

ABSTRACT

Provided is an information processing method that includes: performing a conversion process on the first feature information to obtain a first conversion result; performing the conversion process on the second feature information to obtain a second conversion result; performing a projection process on the first conversion result to obtain a first projection result; performing the projection process on the second conversion result to obtain a second projection result; calculating an error between the first projection result and the second projection result; and training the second model to reduce the error. The conversion process produces an error between the first projection result and the second projection result that is greater than the error between a projection result obtained by performing the projection process on the first feature information and a projection result obtained by performing the projection process on the second feature information.

CROSS REFERENCE TO RELATED APPLICATIONS

This is a continuation application of PCT International Application No.PCT/JP2021/019551 filed on May 24, 2021, designating the United Statesof America, which is based on and claims priority of Japanese PatentApplication No. 2021-016364 filed on Feb. 4, 2021 and U.S. ProvisionalPatent Application No. 63/048,348 filed on Jul. 6, 2020. The entiredisclosures of the above-identified applications, including thespecifications, drawings and claims are incorporated herein by referencein their entirety.

FIELD

The present disclosure relates to information processing methods andinformation processing systems.

BACKGROUND

There has been a technique of changing a configuration for a machinelearning process based on the computing resources and performancespecifications of a system (see Patent Literature (PTL) 1, for example).Owing to this technique, inference performance is maintained to someextent even with limited computing resources and performancespecifications.

In addition, there has been a technique of reducing, based on thedistance between input data in a projection space, a difference ininference performance generated between two inference models whose inputdata are different (see Non Patent Literature (NPL) 1, for example).Owing to this technique, it is possible to reduce such an inferenceperformance difference to some extent even though input data aredifferent between two inference models.

As used herein, inference performance is accuracy or the degree ofprecision of an inference result relative to correct answer data and is,for example, the correct answer rate of an inference result relative tothe entire input data.

CITATION LIST Patent Literature

-   PTL 1: US Patent Application Publication No. 2016/0328644, the    Specification

Non Patent Literature

-   NPL 1: Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, and Trevor    Darrell, “Deep domain confusion: Maximizing for domain invariance”,    arXiv: 1412.3474

SUMMARY Technical Problem

Unfortunately, a problem is that with the technique disclosed in theaforementioned PTL 1, although inference performance is maintained, aninference result obtained using an inference model obtained through amachine learning process for which a configuration has not been changedmay be different from an inference result obtained using an inferencemodel obtained through a machine learning process for which aconfiguration has been changed.

Another problem is that with the technique disclosed in theaforementioned NPL 1, the distance between input data in a projectionspace decreases depending on the combination of the input data, andtraining using a machine learning process may not proceed any further.

The present disclosure is to solve the conventional problems describedabove and provides an information processing method and the like ofreducing an inference result difference to be generated between twoinference models, irrespective of the combination of input data.

Solution to Problem

An information processing method according to one aspect of the presentdisclosure is an information processing method that is executed by aprocessor and includes: inputting first data to a first inference modelto obtain first feature information; inputting the first data to asecond inference model to obtain second feature information; performinga conversion process on the first feature information to obtain a firstconversion result; performing the conversion process on the secondfeature information to obtain a second conversion result; performing aprojection process on the first conversion result to obtain a firstprojection result; performing the projection process on the secondconversion result to obtain a second projection result; obtaining afirst error indicating an error between the first projection result andthe second projection result; and training the second inference model bymachine learning to reduce the first error. The conversion processproduces an error between the first projection result and the secondprojection result that is greater than an error between a firstnon-conversion projection result and a second non-conversion projectionresult, where the first non-conversion projection result is obtained byperforming the projection process on the first feature information, andthe second non-conversion projection result is obtained by performingthe projection process on the second feature information.

An information processing method according to one aspect of the presentdisclosure is an information processing method that is executed by aprocessor and includes: inputting first data to a first inference modelto obtain first feature information; inputting the first data to asecond inference model to obtain second feature information; performinga conversion process on the first feature information to obtain a firstconversion result; performing the conversion process on the secondfeature information to obtain a second conversion result; performing aprojection process on the first conversion result to obtain a firstprojection result; performing the projection process on the secondconversion result to obtain a second projection result; obtaining afirst error indicating an error between the first projection result andthe second projection result; training a third inference model bymachine learning to reduce the first error; and performing a modelconversion process of converting the trained third inference model, toupdate the second inference model. The conversion process produces anerror between the first projection result and the second projectionresult that is greater than an error between a first non-conversionprojection result and a second non-conversion projection result, wherethe first non-conversion projection result is obtained by performing theprojection process on the first feature information, and the secondnon-conversion projection result is obtained by performing theprojection process on the second feature information.

An information processing system according to one aspect of the presentdisclosure includes: an obtainer that obtains second data; and aninference unit that inputs the second data obtained by the obtainer to asecond inference model, and obtains and outputs a second inferenceresult. The second inference model is a model obtained by executing aninformation processing method that is executed by a processor andincludes: inputting first data to a first inference model to obtainfirst feature information; inputting the first data to a secondinference model to obtain second feature information; performing aconversion process on the first feature information to obtain a firstconversion result; performing the conversion process on the secondfeature information to obtain a second conversion result; performing aprojection process on the first conversion result to obtain a firstprojection result; performing the projection process on the secondconversion result to obtain a second projection result; obtaining afirst error indicating an error between the first projection result andthe second projection result; and training the second inference model bymachine learning to reduce the first error. The conversion processproduces an error between the first projection result and the secondprojection result that is greater than an error between a firstnon-conversion projection result and a second non-conversion projectionresult, where the first non-conversion projection result is obtained byperforming the projection process on the first feature information, andthe second non-conversion projection result is obtained by performingthe projection process on the second feature information.

An information processing system according to one aspect of the presentdisclosure includes: an obtainer that obtains second data; and aninference unit that inputs the second data obtained by the obtainer to asecond inference model, and obtains and outputs a second inferenceresult. The second inference model is a model obtained by executing aninformation processing method that is executed by a processor andincludes: inputting first data to a first inference model to obtainfirst feature information; inputting the first data to a secondinference model to obtain second feature information; performing aconversion process on the first feature information to obtain a firstconversion result; performing the conversion process on the secondfeature information to obtain a second conversion result; performing aprojection process on the first conversion result to obtain a firstprojection result; performing the projection process on the secondconversion result to obtain a second projection result; obtaining afirst error indicating an error between the first projection result andthe second projection result; training a third inference model bymachine learning to reduce the first error; and performing a modelconversion process of converting the trained third inference model, toupdate the second inference model. The conversion process produces anerror between the first projection result and the second projectionresult that is greater than an error between a first non-conversionprojection result and a second non-conversion projection result, wherethe first non-conversion projection result is obtained by performing theprojection process on the first feature information, and the secondnon-conversion projection result is obtained by performing theprojection process on the second feature information.

Advantageous Effects

With the information processing method and the like according to oneaspect of the present disclosure, it is possible to reduce an inferenceresult difference to be generated between two inference models,irrespective of the combination of input data.

BRIEF DESCRIPTION OF DRAWINGS

These and other advantages and features will become apparent from thefollowing description thereof taken in conjunction with the accompanyingDrawings, by way of non-limiting examples of embodiments disclosedherein.

FIG. 1 is a block diagram illustrating the functional configuration ofan information processing system according to Embodiment 1.

FIG. 2 is a diagram illustrating training conducted by a secondinference unit in the information processing system according toEmbodiment 1.

FIG. 3 is a flowchart illustrating processing executed by theinformation processing system according to Embodiment 1.

FIG. 4 is a block diagram illustrating the functional configuration ofan inference system according to Embodiment 1.

FIG. 5 is a flowchart illustrating processing executed by the inferencesystem according to Embodiment 1.

FIG. 6 is a block diagram illustrating the functional configuration ofan information processing system according to Embodiment 2.

FIG. 7 is a diagram illustrating training conducted by a secondinference unit in the information processing system according toEmbodiment 2.

FIG. 8 is a flowchart illustrating processing executed by theinformation processing system according to Embodiment 2.

FIG. 9 is a block diagram illustrating the functional configuration ofan information processing system according to Embodiment 3.

FIG. 10 is a diagram illustrating training conducted by a secondinference unit in the information processing system according toEmbodiment 3.

FIG. 11 is a flowchart illustrating processing executed by theinformation processing system according to Embodiment 3.

FIG. 12 is a block diagram illustrating the functional configuration ofan information processing system according to Embodiment 4.

FIG. 13 is a diagram illustrating a process of changing a projectionprocess in the information processing system according to Embodiment 4.

FIG. 14 is a diagram illustrating training conducted by a secondinference unit in the information processing system according toEmbodiment 4.

FIG. 15 is a flowchart illustrating processing executed by theinformation processing system according to Embodiment 4.

FIG. 16 is a flowchart illustrating a process of changing a projectionprocess in the processing executed by the information processing systemaccording to Embodiment 4.

FIG. 17 is a block diagram illustrating the functional configuration ofan information processing system according to Embodiment 5.

FIG. 18 is a diagram illustrating a process of changing a conversionprocess in the information processing system according to Embodiment 5.

FIG. 19 is a diagram illustrating training conducted by a secondinference unit in the information processing system according toEmbodiment 5.

FIG. 20 is a flowchart illustrating processing executed by theinformation processing system according to Embodiment 5.

FIG. 21 is a flowchart illustrating a process of changing a conversionprocess in the processing executed by the information processing systemaccording to Embodiment 5.

FIG. 22 is a block diagram illustrating the functional configuration ofan information processing system according to Embodiment 6.

FIG. 23 is a diagram illustrating a process of changing the combinationof a conversion process and a projection process in the informationprocessing system according to Embodiment 6.

FIG. 24 is a diagram illustrating training conducted by a secondinference unit in the information processing system according toEmbodiment 6.

FIG. 25 is a flowchart illustrating processing executed by theinformation processing system according to Embodiment 6.

FIG. 26 is a flowchart illustrating a process of changing thecombination of a conversion process and a projection process in theprocessing executed by the information processing system according toEmbodiment 6.

FIG. 27 is a flowchart illustrating processing executed by aninformation processing system according to a variation.

DESCRIPTION OF EMBODIMENTS (Circumstances Leading to the PresentDisclosure)

In relation to the techniques disclosed in the Background section, theinventors have found the following problems.

In recent years, embedding an inference model trained by machinelearning such as deep learning in an IoT device has been considered. Interms of cost and privacy, however, it is demanded that such aninference model be operated not in a cloud computing environment or anenvironment where a graphical processing unit (GPU) is used, but by aprocessor in a device with limited computing resources such as computingpower and memory capacity. In order to perform inference using such aprocessor with limited computing resources, it is conceivable tocompress an inference model using a method such as quantizing aninference model.

The technique disclosed in PTL 1, for example, changes a configurationfor a machine learning process based on the computing resources andperformance specifications of a system. Accordingly, inferenceperformance is maintained to some extent even with limited computingresources and performance specifications. As used herein, inferenceperformance is accuracy or the degree of precision of an inferenceresult relative to correct answer data, and is the correct answer rateof an inference result relative to the entire input data, for example.When there are a plurality of inference targets in a single input dataitem, inference performance may be the correct answer rate of inferenceresults relative to all the inference targets in the input data item.

A difference, however, may be generated between the behavior of aninference model that has not been compressed and the behavior of aninference model that has been compressed even though inferenceperformance is maintained. Stated differently, a difference may begenerated between an inference result obtained using an inference modelthat has not been compressed and an inference result obtained using aninference model that has been compressed.

In contrast, the technique disclosed in NPL 1 reduces, based on thedistance between input data in a projection space, a difference ininference performance generated between two inference models whose inputdata are different. Accordingly, it is possible to reduce such aninference performance difference to some extent even though input dataare different between two inference models.

Depending on the combination of input data, however, the distancebetween inference results in a projection space which are output by twoinference models based on input data decreases, and training using amachine learning process may not proceed any further. When input dataare same or similar to each other, for example, the distance, in aprojection space, between inference results to be output decreases, andthis may render the training difficult to proceed.

In view of such problems as described above, the inventors repeateddedicated studies and experiments. As a result, the inventors arrived atthe subsequently described information processing method and the likeaccording to one aspect of the present disclosure. The informationprocessing method can reduce an inference result difference to begenerated between two inference models, irrespective of the combinationof input data.

An information processing method according to one aspect of the presentdisclosure is an information processing method that is executed by aprocessor and includes: inputting first data to a first inference modelto obtain first feature information; inputting the first data to asecond inference model to obtain second feature information; performinga conversion process on the first feature information to obtain a firstconversion result; performing the conversion process on the secondfeature information to obtain a second conversion result; performing aprojection process on the first conversion result to obtain a firstprojection result; performing the projection process on the secondconversion result to obtain a second projection result; obtaining afirst error indicating an error between the first projection result andthe second projection result; and training the second inference model bymachine learning to reduce the first error. The conversion processproduces an error between the first projection result and the secondprojection result that is greater than an error between a firstnon-conversion projection result and a second non-conversion projectionresult, where the first non-conversion projection result is obtained byperforming the projection process on the first feature information, andthe second non-conversion projection result is obtained by performingthe projection process on the second feature information.

According to the aspect, the information processing method performs aconversion process on first feature information and second featureinformation so that the error between a first projection result and asecond projection result is greater than the error between a firstnon-conversion projection result and a second non-conversion projectionresult. As a result, it is possible to conduct training more smoothlythan the case of using the error between a first non-conversionprojection result and a second non-conversion projection result fortraining using a machine learning process. In addition, the informationprocessing method trains a second inference model to reduce the errorbetween a first projection result and a second projection result. As aresult, the second inference model is trained to output the sameinference result as that obtained using a first inference model. Inother words, the information processing method can reduce an inferenceresult difference to be generated between the first inference model andthe second inference model. Specifically, the information processingmethod can thus reduce an inference result difference to be generatedwhen obtaining a new inference model using an inference model as anexemplar. Accordingly, the information processing method can reduce aninference result difference to be generated between two inferencemodels, irrespective of the combination of input data.

In the training of the second inference model, the second inferencemodel may be trained by machine learning using also a second errorindicating a difference between a first inference result and a secondinference result, where the first inference result is additionallyobtained by inputting the first data to the first inference model, andthe second inference result is additionally obtained by inputting thefirst data to the second inference model.

According to the aspect, the second inference model is trained usingalso the error between an inference result for first data, which isobtained from the first inference model (a first inference result), andan inference result for the first data, which is obtained from thesecond inference model (a second inference result). Since the trainingis performed not only to reduce the difference between projectionresults but also to directly reduce the difference between an inferenceresult obtained using the first inference model and an inference resultobtained using the second inference model, it is possible to reduce evenmore an inference result difference to be generated between these twoinference models.

An information processing method according to one aspect of the presentdisclosure is an information processing method that is executed by aprocessor and includes: inputting first data to a first inference modelto obtain first feature information; inputting the first data to asecond inference model to obtain second feature information; performinga conversion process on the first feature information to obtain a firstconversion result; performing the conversion process on the secondfeature information to obtain a second conversion result; performing aprojection process on the first conversion result to obtain a firstprojection result; performing the projection process on the secondconversion result to obtain a second projection result; obtaining afirst error indicating an error between the first projection result andthe second projection result; training a third inference model bymachine learning to reduce the first error; and performing a modelconversion process of converting the trained third inference model, toupdate the second inference model. The conversion process produces anerror between the first projection result and the second projectionresult that is greater than an error between a first non-conversionprojection result and a second non-conversion projection result, wherethe first non-conversion projection result is obtained by performing theprojection process on the first feature information, and the secondnon-conversion projection result is obtained by performing theprojection process on the second feature information.

According to the aspect, the information processing method performs aconversion process on first feature information and second featureinformation so that the error between a first projection result and asecond projection result is greater than the error between a firstnon-conversion projection result and a second non-conversion projectionresult. As a result, it is possible to conduct training more smoothlythan the case of using the error between a first non-conversionprojection result and a second non-conversion projection result fortraining using a machine learning process. In addition, the informationprocessing method trains a third inference model to reduce the errorbetween a first projection result and a second projection result. Byobtaining a new second inference model from the trained third inferencemodel through a model conversion process, a second inference model isupdated. It can be said that as a result, the second inference model isindirectly trained to output the same inference result as that obtainedusing a first inference model. In other words, the informationprocessing method can reduce an inference result difference to begenerated between the first inference model and the second inferencemodel. Specifically, the information processing method can reduce aninference result difference to be generated when obtaining a newinference model using an inference model as an exemplar. Accordingly,the information processing method can reduce an inference resultdifference to be generated between two inference models, irrespective ofthe combination of input data.

In the training of the third inference model, the third inference modelmay be trained by machine learning using also a second error indicatinga difference between a first inference result and a second inferenceresult, where the first inference result is additionally obtained byinputting the first data to the first inference model, and the secondinference result is additionally obtained by inputting the first data tothe second inference model

According to the aspect, the third inference model is trained using alsothe error between an inference result for first data, which is obtainedfrom a first inference model (a first inference result), and aninference result for the first data, which is obtained from a secondinference model (a second inference result). By obtaining a second newinference model from the trained third inference model through a modelconversion process, the second inference model is updated. Since thetraining is performed not only to reduce the difference betweenprojection results but also to directly reduce the difference between aninference result obtained using the first inference model and aninference result obtained using the second inference model, it ispossible to reduce even more an inference result difference to begenerated between these two inference models.

The information processing method may further include changing theprojection process to increase the first error.

According to the aspect, a projection process is changed to increase theerror between a first projection result and a second projection result(a first error). This makes it possible to conduct training using amachine learning process more smoothly than the case of not changing theprojection process. Stated differently, it is possible to inhibit thetraining from being retarded.

The information processing method may further include changing theconversion process to increase the first error.

According to the aspect, a conversion process is changed to increase theerror between a first projection result and a second projection result(a first error). This makes it possible to conduct training using amachine learning process more smoothly than the case of not changing theconversion process. Stated differently, it is possible to inhibit thetraining from being retarded.

The information processing method may further include changing acombination of the conversion process and the projection process toincrease the first error.

According to the aspect, the combination of a conversion process and aprojection process is changed to increase the error between a firstprojection result and a second projection result (a first error). Thismakes it possible to conduct training using a machine learning processmore smoothly than the case of not changing at least one of theconversion process or the projection process. Stated differently, it ispossible to inhibit the training from being retarded.

The first inference model, the second inference model, and the thirdinference model may be each a neural network model, and the modelconversion process may include a process of compressing the neuralnetwork model.

According to the aspect, a second inference model is obtained bycompressing a neural network model which is a third inference model.This can reduce an inference result difference that may be generatedwhen obtaining a new compressed second inference model using a firstinference model as an exemplar. When obtaining a new compressedinference model using an inference model as an exemplar, the informationprocessing method can thus reduce a difference to be generated betweenthe first inference model and the second inference model. Accordingly,even in an environment where the computing resources of, for instance,an IoT device are limited, it is possible to apply a second inferencemodel that exhibits a behavior similar to the behavior of a firstinference model while maintaining inference performance.

The process of compressing the neural network model may include aprocess of quantizing the neural network model

According to the aspect, a second inference model is obtained byquantizing a neural network model which is a third inference model. Itis therefore possible to compress the neural network model withoutchanging its network configuration, thereby inhibiting any change ininference performance and inference results (behaviors) before and afterthe compression.

The process of quantizing the neural network model may include a processof converting a coefficient in the neural network model from afloating-point format to a fixed-point format.

According to the aspect, a second inference model is obtained byconverting coefficients (weights) included in a neural network modelwhich is a third inference model from a floating-point format to afixed-point format. It is therefore possible to adapt the secondinference model to a general embedded environment while inhibiting anychange in inference performance and inference results (behaviors).

The process of compressing the neural network model may include aprocess of reducing a total number of nodes in the neural network modelor a process of removing a connection between nodes in the neuralnetwork model.

According to the aspect, a second inference model is obtained byreducing the number of nodes in a neural network model which is a thirdinference model or removing a connection between nodes in the neuralnetwork model. Since reduction in the number of nodes and the removal ofthe connection between nodes directly lead to reduction in the amount ofcomputing, it is possible to adapt the second inference model to anenvironment where computing resources are strictly limited.

The conversion process may include a process of performing scaleconversion on an input.

According to the aspect, a first conversion result and a secondconversion result are obtained by changing the scales of first featureinformation and second feature information. Since this can, for example,remove or reduce a scale difference between the first featureinformation and the second feature information, it is possible toclarify the difference between the first feature information and thesecond feature information. Stated differently, the difference can beincreased. As a result, since the difference between the firstprojection result and the second projection result is also clarified, itis possible to conduct training even more smoothly. Stated differently,it is possible to inhibit the training from being retarded. It can besaid, from another perspective, that the difference between thedistribution of the first feature information and the distribution ofthe second feature information can be clarified. By reducing theseclarified differences through training using a machine learning process,an inference result difference to be generated between two inferencemodels can be further reduced.

The projection process may include a process of projecting input to aninner product space.

According to the aspect, a first projection result and a secondprojection result can be obtained by projecting a first conversionresult and a second conversion result to a space where an inner productis defined. Accordingly, it is possible to define the norm between thefirst projection result and the second projection result, therebytraining a second inference model to reduce the norm, for example. As aresult, the information processing method can reduce an inference resultdifference to be generated between two inference models.

The projection process may include a process of reducing a total numberof dimensions of input.

According to the aspect, a first conversion result and a secondconversion result are obtained by reducing the number of dimensions ofthe first conversion result and the number of dimensions of the secondconversion result. Accordingly, by selecting a projection axispresenting the difference between the first conversion result and thesecond conversion result, and then performing a process of reducing thenumber of dimensions other than the selected projection axis, it ispossible to obtain a first projection result and a second projectionresult. As a result, the information processing method can shorten evenmore a time required for calculating the error between the firstprojection result and the second projection result. In addition, theinformation processing method can effectively reduce an inference resultdifference to be generated between two inference models.

The process of reducing the total number of dimensions may includeprincipal component analysis.

According to the aspect, a first projection result and a secondprojection result are obtained by performing principal componentanalysis on a first conversion result and a second conversion result andthe process of reducing the number of dimensions. Since this reduces oneor more principal components other than at least one specific principalcomponent, it is possible to clarify the difference between the firstprojection result and the second projection result. For example, aprincipal component whose error (distance) between the distribution ofthe first projection result and the distribution of the secondprojection result is likely to be large compared with other principalcomponents may be set for a specific principal component. As a result,the information processing method can shorten a time required forcalculating the error between the first projection result and the secondprojection result. In addition, it is possible to effectively reduce aninference result difference to be generated between two inferencemodels.

The first data may be image data.

According to the aspect, when obtaining a new inference model using, asan exemplar, an inference model for use in inference performed on imagedata, it is possible to reduce an inference result difference to begenerated between a first inference model and a second inference model.

An information processing system according to one aspect of the presentdisclosure includes: an obtainer that obtains second data; and aninference unit that inputs the second data obtained by the obtainer to asecond inference model, and obtains and outputs a second inferenceresult. The second inference model is a model obtained by executing aninformation processing method that is executed by a processor andincludes: inputting first data to a first inference model to obtainfirst feature information; inputting the first data to a secondinference model to obtain second feature information; performing aconversion process on the first feature information to obtain a firstconversion result; performing the conversion process on the secondfeature information to obtain a second conversion result; performing aprojection process on the first conversion result to obtain a firstprojection result; performing the projection process on the secondconversion result to obtain a second projection result; obtaining afirst error indicating an error between the first projection result andthe second projection result; and training the second inference model bymachine learning to reduce the first error. The conversion processproduces an error between the first projection result and the secondprojection result that is greater than an error between a firstnon-conversion projection result and a second non-conversion projectionresult, where the first non-conversion projection result is obtained byperforming the projection process on the first feature information, andthe second non-conversion projection result is obtained by performingthe projection process on the second feature information.

According to the aspect, the information processing system can (i)execute an inference process using a new inference model generated usingan existing inference model as an exemplar to reduce an inference resultdifference, and (ii) output the inference result. It is thus possible toutilize, instead of the existing inference model, the new inferencemodel that produces a small inference result difference. Stateddifferently, the information processing system can reduce an inferenceresult difference to be generated between two inference models,irrespective of the combination of input data.

An information processing system according to one aspect of the presentdisclosure includes: an obtainer that obtains second data; and aninference unit that inputs the second data obtained by the obtainer to asecond inference model, and obtains and outputs a second inferenceresult. The second inference model is a model obtained by executing aninformation processing method that is executed by a processor andincludes: inputting first data to a first inference model to obtainfirst feature information; inputting the first data to a secondinference model to obtain second feature information; performing aconversion process on the first feature information to obtain a firstconversion result; performing the conversion process on the secondfeature information to obtain a second conversion result; performing aprojection process on the first conversion result to obtain a firstprojection result; performing the projection process on the secondconversion result to obtain a second projection result; obtaining afirst error indicating an error between the first projection result andthe second projection result; training a third inference model bymachine learning to reduce the first error; and performing a modelconversion process of converting the trained third inference model, toupdate the second inference model. The conversion process produces anerror between the first projection result and the second projectionresult that is greater than an error between a first non-conversionprojection result and a second non-conversion projection result, wherethe first non-conversion projection result is obtained by performing theprojection process on the first feature information, and the secondnon-conversion projection result is obtained by performing theprojection process on the second feature information.

According to the aspect, the information processing system can (i)execute an inference process using a new inference model generated usingan existing inference model as an exemplar to reduce an inference resultdifference, and (ii) output the inference result. Stated differently,the information processing system can reduce an inference resultdifference to be generated between two inference models, irrespective ofthe combination of input data.

These general or specific aspects may be implemented using a system, adevice, an integrated circuit, a computer program, or acomputer-readable recording medium such as a CD-ROM, or any combinationof systems, devices, integrated circuits, computer programs, orcomputer-readable recording media.

Hereinafter, certain exemplary embodiments are described in greaterdetail with reference to the accompanying Drawings.

Each of the exemplary embodiments described below shows a general orspecific example of the present disclosure. The numerical values,shapes, materials, elements, the arrangement and connection of theelements, steps, an order of the steps, etc., shown in the followingexemplary embodiments are mere examples, and therefore do not limit thescope of the appended Claims and their equivalents. Therefore, among theelements in the following exemplary embodiments, those not recited inany one of the independent claims are described as optional elements.

Embodiment 1

Embodiment 1 will describe an information processing method and aninformation processing system that reduce, irrespective of thecombination of input data, an inference result difference that may begenerated when obtaining a new inference model using an inference modelas an exemplar.

FIG. 1 is a block diagram illustrating the functional configuration ofinformation processing system 10A according to Embodiment 1. Informationprocessing system 10A is a system for obtaining a new inference modeltrained to output the same inference result as that obtained using anexisting inference model.

As illustrated in FIG. 1 , information processing system 10A includesfirst inference unit 11A, second inference unit 12A, output converter13A, space projector 14A, error calculator 15A, trainer 16A, andtraining controller 17A.

Information processing system 10A is implemented by, for example, aprocessor (e.g., a central processing unit (CPU)) executing a programstored in memory in a computer device including the processor and thememory. Information processing system 10A may be implemented by a singledevice or a plurality of devices that are mutually communicable.

First inference unit 11A and second inference unit 12A each infer datathat has been input (also referred to as input data) using an inferencemodel. The inference model is, for example, a neural network model. Theinput data is, for example, image data. Hereinafter, description isprovided assuming that input data is image data, but input data does notnecessarily need to be limited to an example in which input data isimage data. For example, audio data output from a microphone, pointcloud data output from a radar such as light detection and ranging(LiDAR), compression data output from a compression sensor, temperaturedata output from a temperature sensor, moisture data output from amoisture sensor, or sensing data such as aroma data output from an aromasensor may be used as input data. Input data is equivalent to firstdata.

First inference unit 11A obtains network A as a neural network used foran inference model that infers input data. More specifically, firstinference unit 11A obtains coefficients included in network A. Aninference model that uses network A is equivalent to “an existinginference model” and is also referred to as a first inference model.

First inference unit 11A outputs feature information (also referred toas first feature information) and an inference result obtained byinputting input data to an inference model that uses network A (alsoreferred to as a first inference result).

Second inference unit 12A obtains network B as a neural network used foran inference model that infers input data. More specifically, secondinference unit 12A obtains coefficients included in network B. Aninference model that uses network B is equivalent to a new inferencemodel trained to output the same inference result as that obtained usingan existing inference model, and is also referred to as a secondinference model. The inference model that uses network B is trained bytrainer 16A to output the same inference result as that obtained usingthe inference model that uses network A, as will be described later.

Second inference unit 12A outputs feature information (also referred toas second feature information) and an inference result obtained byinputting input data to an inference model that uses network B (alsoreferred to as a second inference result).

As used herein, an inference result is information indicating the resultof inferring input data and includes, for example, informationindicating an object or conditions shown in image data, or an attributethereof. The inference result may include a feature which is informationindicating a feature of input data. The inference result may beintermediate data obtained in the middle of processing performed by aninference model or the feature may be the intermediate data.

It is assumed herein that the feature is intermediate data of processingperformed by an inference model. In other words, it is assumed thatfeature information is intermediate output of an inference model. Forexample, when input data is image data, feature information is a featuremap indicating a feature of the image data. The inference model may be amodel that outputs feature information as final output.

Output converter 13A obtains feature information output by firstinference unit 11A and second inference unit 12A, and converts theobtained feature information using a conversion process. Morespecifically, output converter 13A obtains first feature informationfrom first inference unit 11A and second feature information from secondinference unit 12A. Output converter 13A then converts each of theobtained first feature information and second feature information usingthe conversion process, and obtains conversion results regarding thefeature information input. In other words, output converter 13A outputsa conversion result which is the result of converting the first featureinformation using the conversion process (also referred to as a firstconversion result) and a conversion result which is the result ofconverting the second feature information using the conversion process(also referred to as a second conversion result).

The conversion process produces an error between (i) a projection resultindicating the result of projecting the first conversion result using aprojection process performed by space projector 14A to be describedlater (also referred to as a first projection result), and (ii) aprojection result indicating the result of projecting the secondconversion result using the projection process (also referred to as asecond projection result), which is greater than the error between (iii)a projection result indicating the result of projecting the firstfeature information (also referred to as a first non-conversionprojection result), and (iv) a projection result indicating the resultof projecting the second feature information (also referred to as asecond non-conversion projection result).

Space projector 14A obtains conversion results output by outputconverter 13A and projects the obtained conversion results using aprojection process. More specifically, space projector 14A obtains thefirst conversion result and the second conversion result from outputconverter 13A. Space projector 14A then projects, using the projectionprocess, each of the conversion results obtained from output converter13A, and obtains projection results regarding the conversion resultsthat have been input. In other words, space projector 14A outputs aprojection result which is the result of projecting the first conversionresult using the projection process (also referred to as a firstprojection result), and a projection result which is the result ofprojecting the second conversion result using the projection process(also referred to as a second projection result).

Error calculator 15A obtains projection results output by spaceprojector 14A and calculates the error between the obtained projectionresults. More specifically, error calculator 15A obtains the firstprojection result and the second projection result output by spaceprojector 14A. Error calculator 15A then calculates error informationindicating the difference between the obtained first projection resultand the obtained second projection result (also referred to as a firsterror). The error information is calculated by computing using a lossfunction held by error calculator 15A. The loss function is, forexample, the norm (difference) between projection results in aprojection space, and the norm is calculated, for example, using afunction utilizing the sum of squares error between sets of coordinateseach indicating a different one of the projection results. An errorcalculation method is not limited to the above example.

Trainer 16A trains an inference model that uses network B by machinelearning. Trainer 16A obtains the first error calculated by errorcalculator 15A and trains the inference model that uses network B bymachine learning to reduce the first error. More specifically, trainer16A refers to a loss function held by error calculator 15A and updatescoefficients included in network B to reduce the first error. Awell-known technique such as a norm using a sum of square errors may beemployed for the loss function.

Training controller 17A controls the training of an inference model thatuses a neural network. More specifically, training controller 17Adetermines whether the difference between the behavior of network A andthe behavior of network B updated by trainer 16A reaches requiredperformance, and decides whether to train the inference model that usesnetwork B based on the determination result. For example, trainingcontroller 17A obtains a first inference result output by firstinference unit 11A and a second inference result output by secondinference unit 12A having obtained network B updated by trainer 16A, anddetermines whether the difference between the first inference result andthe second inference result is less than an allowed value.

When determining that the difference between the behavior of network Aand the behavior of network B reaches the required performance, forexample, training controller 17A ends the training of the inferencemodel that uses network B. More specifically, training controller 17Aends the training when the difference between the first inference resultand the second inference result is less than the allowed value.

When determining that the difference between the behavior of network Aand the behavior of network B does not reach the required performance,for example, training controller 17A continues the training of theinference model that uses network B. In this case, training controller17A further trains the inference model that uses network B by, forexample, causing each of first inference unit 11A and second inferenceunit 12A to input new input data and causing first inference unit 11A,second inference unit 12A, output converter 13A, space projector 14A,error calculator 15A, and trainer 16A to execute the above processingagain using network A, new network B, and new inputs.

Hereinafter, the outline of updating network B performed by informationprocessing system 10A will be described.

FIG. 2 is a diagram illustrating training conducted by second inferenceunit 12A in information processing system 10A according to Embodiment 1.

When input data is input, first inference unit 11A executes an inferenceprocess of inferring an image using an inference model that uses networkA, and outputs feature information which is intermediate output. Thefeature information is, for example, an intermediate feature map in aneural network. The intermediate feature map includes a featureindicating a feature of image data. The same applies to the followingdescription. The feature information output by first inference unit 11Ais provided for output converter 13A.

When input data is input, second inference unit 12A executes aninference process of inferring an image using an inference model thatuses network B, and outputs feature information which is intermediateoutput. The feature information is the same information as the featureinformation output by first inference unit 11A. The feature informationoutput by second inference unit 12A is provided for output converter13A.

Output converter 13A performs a conversion process on the featureinformation provided by first inference unit 11A and the featureinformation provided by second inference unit 12A. The conversionprocess is, for example, scale conversion of changing the range of avalue indicated by the feature information. For example, conversionprocess f is linear scale conversion obtained by the following Equation1 where x denotes input, and a denotes a coefficient used for the scaleconversion.

f(x)=a×x  Equation 1

The conversion process is not limited to the above example. With theconversion process, error E1 between a first projection result and asecond projection result which are to be obtained through the followingprojection process becomes greater than error E0 between a firstnon-conversion projection result and a second non-conversion projectionresult which are to be obtained without the conversion process. Stateddifferently, such a conversion process (e.g., linear scale conversioncoefficients) that produces error E1 greater than error E0 is set.

Space projector 14A performs a projection process on conversion resultsoutput by output converter 13A. The projection process is, for example,a process of reducing the number of dimensions of input or a principalcomponent analysis process. At least one specific principal component isselected, and one or more principal components other than the at leastone selected principal component are removed. In other words, whenperforming the above process on input of the n-th dimension to obtain aprojection result of the m-th dimension (n>m), space projector 14Aobtains input x=(x1, x2, . . . , xn) and outputs projection resulty=(y1, y2, . . . , ym). The projection process is not limited to theabove example.

Error calculator 15A calculates the error between the projection resultsoutput by space projector 14A. The error is, for example, the norm(distance) between projection results in a projection space, and thenorm is calculated using, for example, a function utilizing the sum ofsquares error between sets of coordinates each indicating a differentone of the projection results. In other words, when first projectionresult y1=(y11, y12, . . . , y1m) and second projection result y2=(y21,y22, . . . , y2m) are output as the projection results output by spaceprojector 14A, error calculator 15A calculates the sum of squares errorbetween projection result y1 and projection result y2 using thefollowing Expression 2. An error calculation method is not limited tothe above example.

(y11−y21)²+(y12−y22)²+ . . . +(y1m−y2m)²  Expression 2

Trainer 16A adjusts coefficients included in network B to reduce anerror to be calculated by error calculator 15A. In this case, trainer16A refers to a loss function and adjusts the coefficients to reduce theerror through the coefficient adjustment. Trainer 16A thus updatesnetwork B by adjusting the coefficients in network B. The followingdescribes processing executed by information processing system 10Aconfigured as described above.

FIG. 3 is a flowchart illustrating processing (also referred to as aninformation processing method) executed by information processing system10A according to Embodiment 1.

In step S101, first inference unit 11A inputs input data to an inferencemodel that uses network A, and obtains first feature information vianetwork A.

In step S102, second inference unit 12A inputs input data to aninference model that uses network B, and obtains second featureinformation via network B.

In step S103, output converter 13A performs a conversion process on thefirst feature information obtained by first inference unit 11A vianetwork A in step S101, to obtain a first conversion result.

In step S104, output converter 13A performs the conversion process onthe second feature information obtained by second inference unit 12A vianetwork B in step S102, to obtain a second conversion result.

In step S105, space projector 14A performs a projection process on thefirst conversion result obtained by output converter 13A in step S103,to obtain a first projection result.

In step S106, space projector 14A performs the projection process on thesecond conversion result obtained by output converter 13A in step S104,to obtain a second projection result.

In step 107, error calculator 15A calculates error E1 between the firstprojection result obtained by space projector 14A in step S105 and thesecond projection result obtained by space projector 14A in step S106.

In step S108, trainer 16A updates coefficients in network B using errorE1 calculated in step S107, to reduce error E1.

In step S109, training controller 17A determines whether the behaviordifference between network A and network B updated by trainer 16Areaches required performance that is predetermined. In other words,training controller 17A determines whether the difference between aninference result obtained using the inference model that uses network Aand an inference result obtained using the inference model that usesnetwork B is less than an allowed value. When the difference reaches therequired performance, information processing system 10A ends theprocessing. When the difference does not reach the required performance,information processing system 10A returns to the process in step S102and repeats the same sequence of processes as described above.

Through the sequence of the processes described above, informationprocessing system 10A performs a conversion process on first featureinformation obtained via network A and second feature informationobtained via network B so that error E1 between a first projectionresult and a second projection result is greater than error E0 between afirst non-conversion projection result and a second non-conversionprojection result. As a result, it is possible to conduct training moresmoothly than the case of using error E0 for training using a machinelearning process. Moreover, information processing system 10A trains aninference model that uses network B to reduce error E1. As a result, theinference model that uses network B is trained to output the sameinference result as that obtained using an inference model that usesnetwork A. Information processing system 10A is thus capable of reducingan inference result difference that may be generated when obtaining anew inference model using the inference model that uses network A as anexemplar. Accordingly, it is possible to reduce, irrespective of thecombination of input data, an inference result difference to begenerated between the inference model that uses network A and theinference model that uses network B.

Next, inference system 20A that uses network B obtained usinginformation processing system 10A will be described. An inference systemis also referred to as an information processing system.

FIG. 4 is a block diagram illustrating the functional configuration ofinference system 20A according to Embodiment 1.

As illustrated in FIG. 4 , inference system 20A includes obtainer 21Aand second inference unit 22A.

Inference system 20A is implemented by, for example, a processor (e.g.,a CPU) executing a program stored in memory in a computer deviceincluding the processor and the memory.

Inference system 20A may be implemented by a single device or aplurality of devices that are mutually communicable.

Obtainer 21A obtains data that has been input (also referred to as inputdata). The input data is, for example, image data, as is the case ofdata to be input to information processing system 10A. Hereinafter,description is provided assuming that input data is image data, butinput data does not necessarily need to be limited to an example inwhich input data is image data, as is the case of information processingsystem 10A.

Obtainer 21A provides the obtained input data for second inference unit22A. Input data is equivalent to second data.

Second inference unit 22A inputs the input data obtained by obtainer 21Ato an inference model (equivalent to a second inference model), andobtains and outputs an inference result. The inference model used bysecond inference unit 22A to obtain the inference result is an inferencemodel that uses network B and has been trained by information processingsystem 10A.

FIG. 5 is a flowchart illustrating processing executed by inferencesystem 20A according to Embodiment 1.

In step S201, obtainer 21A obtains input data.

Second inference unit 22A inputs the input data obtained by obtainer 21Ato an inference model in step S202, and obtains and outputs an inferenceresult in step S203.

Inference system 20A is thus capable of executing an inference processusing a new inference model generated using an existing inference modelas an exemplar to reduce an inference result difference, and outputtingthe inference result.

As described above, the information processing method according toEmbodiment 1 performs a conversion process on first feature informationand second feature information so that the error between a firstprojection result and a second projection result is greater than theerror between a first non-conversion projection result and a secondnon-conversion projection result. As a result, it is possible to conducttraining more smoothly than the case of using the error between a firstnon-conversion projection result and a second non-conversion projectionresult for training using a machine learning process. In addition, theinformation processing method trains a second inference model to reducethe error between a first projection result and a second projectionresult. As a result, a second inference model is trained to output thesame inference result as that obtained using a first inference model.The information processing method can thus reduce an inference resultdifference that may be generated when obtaining a new inference modelusing an inference model as an exemplar. Accordingly, it is possible toreduce an inference result difference to be generated between twoinference models, irrespective of the combination of input data. Whenobtaining a new inference model using, as an exemplar, an inferencemodel used for inferring image data, it is possible to reduce aninference result difference to be generated between these two inferencemodels.

With an information processing system according to Embodiment 1, it ispossible to (i) execute an inference process using a new inference modelgenerated using an existing inference model as an exemplar to reduce aninference result difference, and (ii) output the inference result. It isthus possible to utilize, instead of the existing inference model, thenew inference model that produces a small inference result difference.Stated differently, the information processing system is capable ofreducing an inference result difference to be generated between twoinference models, irrespective of the combination of input data.

A first conversion result and a second conversion result are obtained bychanging the scales of first feature information and second featureinformation. Accordingly, it is possible to remove or reduce a scaledifference between the first feature information and the second featureinformation, thereby clarifying the difference between the first featureinformation and the second feature information. Stated differently, thedifference can be increased.

As a result, since the difference between a first projection result anda second projection result is also clarified, it is possible to conducttraining more smoothly. Stated differently, it is possible to inhibittraining from being retarded. It can be also said, from anotherperspective, that the difference between the distribution of the firstfeature information and the distribution of the second featureinformation can be clarified. By reducing these clarified differencesthrough training using a machine learning process, it is possible tofurther reduce an inference result difference to be generated betweentwo inference models.

Embodiment 2

Embodiment 2 describes an information processing method and aninformation processing system that are different from the informationprocessing method and the information processing system according toEmbodiment 1, and that reduce, irrespective of the combination of inputdata, an inference result difference that may be generated whenobtaining a new inference model using an inference model as an exemplar.

Hereinafter, an information processing system according to Embodiment 2configured by modifying part of information processing system 10Aaccording to Embodiment 1 will be described.

Elements of the information processing system according to Embodiment 2that are same as those included in information processing system 10Aaccording to Embodiment 1 are already described and therefore assignedwith like reference signs, and detailed description thereof is omitted.The following focuses on the difference from information processingsystem 10A.

FIG. 6 is a block diagram illustrating the functional configuration ofinformation processing system 10B according to Embodiment 2. Informationprocessing system 10B is a system for obtaining a new inference modeltrained to output the same inference result as that obtained using anexisting inference model.

The format of an existing inference model is different from the formatof a new inference model. Specifically, network coefficients composingthe existing inference model are expressed using a floating-point formatwhereas network coefficients composing the new inference model areexpressed using a fixed-point format. In this case, it can be said, forexample, that information processing system 10B is a system forobtaining a new network expressed using the fixed-point format byquantizing an existing inference model expressed using thefloating-point format.

As illustrated in FIG. 6 , information processing system 10B includesfirst inference unit 11A, second inference unit 12B, output converter13A, space projector 14A, error calculator 15A, trainer 16B, trainingcontroller 17A, and converter 18B.

Among the elements included in information processing system 10B, firstinference unit 11A, output converter 13A, space projector 14A, errorcalculator 15A, and training controller 17A are the same as thoseincluded in information processing system 10A according to Embodiment 1.The following therefore describes second inference unit 12B, trainer16B, and converter 18B in detail.

Second inference unit 12B infers input data using an inference model, asis the case of second inference unit 12A according to Embodiment 1.

Second inference unit 12B obtains network B as a neural network used foran inference model that infers input data, as is the case of secondinference unit 12A according to Embodiment 1. More specifically, secondinference unit 12B obtains coefficients included in network B. Aninference model that uses network B is equivalent to a new inferencemodel trained to output the same inference result as that obtained usingan existing inference model, and is also referred to as a secondinference model.

Second inference unit 12B is different from second inference unit 12Aaccording to Embodiment 1 in the following points: (A) second inferenceunit 12A according to Embodiment 1 obtains a network that is notsubjected to network conversion; and (B) second inference unit 12Bobtains network B that has been converted by converter 18B to bedescribed later and that uses a format different from that of network Aused for an existing inference model. Second inference unit 12B outputsfeature information (also referred to as second feature information) andan inference result obtained by inputting input data to an inferencemodel that uses network B (also referred to as a second inferenceresult).

Trainer 16B trains an inference model that uses network B1 (alsoreferred to as a third inference model) by machine learning. Network B1is a network that uses the same format as that of network A used for anexisting inference model. In other words, network B1 is a network thatuses a format different from that of network B. Trainer 16B obtains afirst error calculated by error calculator 15A and trains the inferencemodel that uses network B1 by machine learning to reduce the firsterror. More specifically, trainer 16B refers to a loss function held byerror calculator 15A, and updates coefficients included in network B1 toreduce the first error. The loss function is the same as that describedin Embodiment 1.

Converter 18B obtains network B by performing a model conversion processon the coefficients in network B1. More specifically, converter 18Bobtains network B1 trained by trainer 16B, and obtains network B byperforming a predetermined model conversion process on the coefficientsin network B1.

The model conversion process includes, for example, a process ofcompressing network B1. The compressing process includes, for example, aprocess of quantizing network B1. When network B1 is a neural network,for example, the quantizing process may include a process of convertingcoefficients in a neural network model from a floating-point format to afixed-point format. The compressing process may include a process ofreducing the number of nodes in the neural network model or removing aconnection between nodes in the neural network model.

FIG. 7 is a diagram illustrating training conducted by second inferenceunit 12B in information processing system 10B according to Embodiment 2.

A process from when input data is input by first inference unit 11Auntil when an error is calculated by error calculator 15A is the same asthat included in the training performed in information processing system10A according to Embodiment 1.

After the error is calculated by error calculator 15A, trainer 16Badjusts the coefficients included in network B1 to reduce an error to becalculated by error calculator 15A. In this case, trainer 16B refers toa loss function and adjusts the coefficients to reduce the error throughthe coefficient adjustment. Trainer 16B thus updates network B1 byadjusting the coefficients in network B1.

Converter 18B obtains network B1 trained by trainer 16B and obtains newnetwork B by performing a conversion process on the coefficients innetwork B1.

The following describes processing executed by information processingsystem 10B configured as described above.

FIG. 8 is a flowchart illustrating processing (also referred to as aninformation processing method) executed by information processing system10B according to Embodiment 2.

The processes included in steps S101 through S107 and S109 illustratedin FIG. 8 are the same processes as those performed by informationprocessing system 10A according to Embodiment 1 (see FIG. 3 , forinstance).

In step S121, trainer 16B updates coefficients in network B1 to reduceerror E1, using error E1 calculated by error calculator 15A in stepS107.

In step S122, converter 18B obtains network B1 whose coefficients havebeen updated by trainer 16B in step S121, and obtains network B byconverting the coefficients in network B1. In step S123, converter 18Bupdates, with network B1 obtained in step S122, network B input toinference unit 12B.

Through a sequence of the processes described above, informationprocessing system 10B trains an inference model that uses network B1, toreduce the error between a first projection result and a secondprojection result. Information processing system 10B then updatesnetwork B by obtaining network B from trained network B1 through a modelconversion process. As a result, an inference model that uses network Bis trained to output the same inference result as that obtained using aninference model that uses network A. Information processing system 10Bis thus capable of reducing an inference result difference that may begenerated when obtaining an inference model that uses network B using aninference model that uses network A as an exemplar.

As described above, the information processing method according toEmbodiment 2 performs a conversion process on first feature informationand second feature information so that the error between a firstprojection result and a second projection result is greater than theerror between a first non-conversion projection result and a secondnon-conversion projection result. As a result, it is possible to conducttraining more smoothly than the case of using the error between a firstnon-conversion projection result and a second non-conversion projectionresult for training using a machine learning process. Moreover, theinformation processing method trains a third inference model to reducethe error between a first projection result and a second projectionresult. Subsequently, by obtaining a second inference model from thetrained third inference model through a model conversion process, asecond inference model is updated. It can be said that as a result, thesecond inference model is indirectly trained to output the sameinference result as that obtained using a first inference model. Inother words, the information processing method can reduce an inferenceresult difference that may be generated between the first inferencemodel and the second inference model. Specifically, it is possible toreduce an inference result difference that may be generated whenobtaining a new inference model using an inference model as an exemplar.Accordingly, it is possible to reduce an inference result difference tobe generated between two inference models, irrespective of thecombination of input data.

A second inference model is obtained by compressing a neural networkmodel which is a third inference model. Accordingly, it is possible toreduce an inference result difference that may be generated whenobtaining a new compressed second inference model using a firstinference model as an exemplar. Therefore, when obtaining a newcompressed inference model using an inference model as an exemplar, theinformation processing method can reduce a difference generated betweenthese two inference models. Accordingly, it is possible to apply, whilemaintaining inference performance, a second inference model whosebehavior is similar to the behavior of a first inference model even inan environment where the computing resources of, for instance, an IoTdevice are limited.

A second inference model is obtained by quantizing a neural networkmodel which is a third inference model. It is therefore possible tocompress the neural network model without changing its networkconfiguration and inhibit any change in inference performance andinference results (behaviors) before and after compression.

A second inference model is obtained by converting coefficients(weights) in a neural network model which is a third inference modelfrom a floating-point format to a fixed-point format. It is thereforepossible to adapt the second inference model to a general embeddedenvironment while inhibiting any change in inference performance andinference results (behaviors).

A second inference model is obtained by reducing the number of nodes ina neural network model which is a third inference model or removing aconnection between nodes in the neural network model. Since reduction inthe number of nodes and the removal of the connection between nodesdirectly lead to reduction in the amount of computing, it is possible toadapt the second inference model to an environment where computingresources are strictly limited.

An inference model obtained using the configuration according toEmbodiment 2 may be utilized in the inference system according toEmbodiment 1. In this case, the inference system is capable of executingan inference process using a new inference model generated using anexisting inference model as an exemplar to reduce an inference resultdifference, and outputting the inference result.

Embodiment 3

Embodiment 3 describes an information processing method and aninformation processing system that are different from the informationprocessing method and the information processing system according toEmbodiment 1 or Embodiment 2, and that reduce, irrespective of thecombination of input data, an inference result difference that may begenerated when obtaining a new inference model using an inference modelas an exemplar.

Hereinafter, an information processing system according to Embodiment 3configured by modifying part of information processing system 10Aaccording to Embodiment 1 will be described.

Elements of the information processing system according to Embodiment 3that are same as those included in information processing system 10Aaccording to Embodiment 1 are already described and therefore assignedwith like reference signs, and detailed description thereof is omitted.The following focuses on the difference from information processingsystem 10A.

FIG. 9 is a block diagram illustrating the functional configuration ofinformation processing system 10C according to Embodiment 3. Informationprocessing system 10C is a system for obtaining a new inference modeltrained to output the same inference result as that obtained using anexisting inference model.

As illustrated in FIG. 9 , information processing system 10C includesfirst inference unit 11C, second inference unit 12C, output converter13A, space projector 14A, error calculator 15A, trainer 16C, trainingcontroller 17A, and second error calculator 19C.

Among the elements included in information processing system 10C, outputconverter 13A, space projector 14A, error calculator 15A, and trainingcontroller 17A are the same as those included in information processingsystem 10A according to Embodiment 1. The following therefore describesfirst inference unit 11C, second inference unit 12C, trainer 16C, andsecond error calculator 19C in detail.

First inference unit 11C has also the following function in addition tofunctions that are same as those of first inference unit 11A accordingto Embodiment 1. In other words, first inference unit 11C provides, forsecond error calculator 19C, an inference result obtained by inputtinginput data to an inference model that uses network A (also referred toas a first inference result).

Second inference unit 12C has also the following function in addition tofunctions that are same as those of second inference unit 12A accordingto Embodiment 1. In other words, second inference unit 12C provides, forsecond error calculator 19C, an inference result obtained by inputtinginput data to an inference model that uses network B (also referred toas a second inference result).

Although it is described herein assuming that (1) first inference unit11C outputs a first inference result to second error calculator 19C andfirst feature information to output converter 13A, and second inferenceunit 12C outputs a second inference result to second error calculator19C and second feature information to output converter 13A, (2) firstinference unit 11C may output first feature information to second errorcalculator 19C and a first inference result to output converter 13A, andsecond inference unit 12C may output second feature information tosecond error calculator 19C and a second inference result to outputconverter 13A, or (3) first inference unit 11C may output a firstinference result to both second error calculator 19C and outputconverter 13A, and second inference unit 12C may output a secondinference result to both second error calculator 19C and outputconverter 13A, or (4) first inference unit 11C may output first featureinformation to both second error calculator 19C and output converter13A, and second inference unit 12C may output second feature informationto both second error calculator 19C and output converter 13A.

When first inference unit 11C and second inference unit 12C output afirst inference result and a second inference result, respectively, tooutput converter 13A, output converter 13A is to obtain the firstinference result and the second inference result and output: a firstconversion result which is the result of converting the first inferenceresult using a conversion process; and a second conversion result whichis the result of converting the second inference result using theconversion process.

Second error calculator 19C calculates the error between the inferenceresult output by first inference unit 11C and the inference resultoutput by second inference unit 12C. In other words, second errorcalculator 19C calculates error information indicating the differencebetween the first inference result output by first inference unit 11Cand the second inference result output by second inference unit 12C(also referred to as a second error). The error information iscalculated by computing using a loss function held by second errorcalculator 19C. The loss function may be the same as that held by errorcalculator 15A according to Embodiment 1.

When first inference unit 11C and second inference unit 12C output thefirst feature information and the second feature information,respectively, to second error calculator 19C, second error calculator19C is to calculate a second error indicating the difference between thefirst feature information and the second feature information.

Trainer 16C trains an inference model that uses network B by machinelearning. Trainer 16C obtains the first error calculated by errorcalculator 15A and the second error calculated by second errorcalculator 19C, and trains the inference model that uses network B bymachine learning to reduce the first error and the second error. Morespecifically, trainer 16C refers to loss functions held by errorcalculator 15A and second error calculator 19C, and updates coefficientsincluded in network B to reduce the first error and the second error. Awell-known technique such as a norm using a sum of squares error may beemployed for the loss functions.

FIG. 10 is a diagram illustrating training conducted by second inferenceunit 12C in information processing system 10C according to Embodiment 3.

A process from when input data is input by first inference unit 11Cuntil when an error is calculated by error calculator 15A is the same asthat included in the training performed in information processing system10A according to Embodiment 1

When input data is input, first inference unit 11C executes an inferenceprocess of inferring an image using an inference model that uses networkA, and outputs the inference result. The inference result is, forexample, information indicating “dog: 70%, cat: 30%”. The inferenceresult indicates that a probability that an object in an input image isa dog is 70% and a probability that the object is a cat is 30%. The sameapplies to the following description. The inference result output byfirst inference unit 11C is provided for second error calculator 19C.

When input data is input, second inference unit 12C executes aninference process of inferring an image using an inference model thatuses network B, and outputs the inference result. The inference resultis same kind of information as that output by first inference unit 11C.The inference result output by second inference unit 12C is provided forsecond error calculator 19C.

Second error calculator 19C calculates the error between the inferenceresult output by first inference unit 11C and the inference resultoutput by second inference unit 12C. Specifically, when information“dog: 70%, cat: 30%” is obtained as an inference result obtained usingthe inference model that uses network A and information “dog:60%, cat:40%” is obtained as an inference result obtained using the inferencemodel that uses network B, second error calculator 19C obtains an errorcalculated from 0.02 which is a sum of 0.01 that is the square of aprobability difference related to a dog (0.7-0.6) in the inferenceresults and 0.01 that is the square of a probability difference relatedto a cat (0.3-0.4) in the inference results.

Trainer 16C adjusts coefficients included in network B to reduce errorsto be calculated by error calculator 15A and second error calculator19C. Trainer 16C refers to the loss functions and adjusts thecoefficients to reduce the errors through the coefficient adjustment.

Trainer 16C thus updates network B by adjusting the coefficients innetwork B.

The following describes processing executed by information processingsystem 10C configured as described above.

FIG. 11 is a flowchart illustrating processing (also referred to as aninformation processing method) executed by information processing system10C according to the present embodiment.

The processes included in steps S101 through S107 and S109 illustratedin FIG. 11 are the same processes as those performed by informationprocessing system 10A according to Embodiment 1 (see FIG. 3 , forinstance).

In step S141, first inference unit 11C inputs input data to an inferencemodel that uses network A, and obtains a first inference result vianetwork A.

In step S142, second inference unit 12C inputs input data to aninference model that uses network B, and obtains a second inferenceresult via network B.

In step S143, second error calculator 19C calculates error E2 betweenthe first inference result obtained by first inference unit 11C in stepS141 and the second inference result obtained by second inference unit12C in step S142.

In step S108C, trainer 16C updates coefficients in network B to reduceerror E1 calculated by error calculator 15A in step S107 and error E2calculated by second error calculator 19C in step S143.

Through a sequence of the processes described above, informationprocessing system 10C trains an inference model that uses network B toreduce error E1 between a first projection result and a secondprojection result. Information processing system 10C also trains theinference model that uses network B to reduce error E2 between a firstinference result and a second inference result. As a result, theinference model that uses network B is further trained to output thesame inference result as that obtained using an inference model thatuses network A. Information processing system 10C is thus capable ofreducing an inference result difference that may be generated whenobtaining a new inference model using the inference model that usesnetwork A as an exemplar. Accordingly, it is possible to further reduce,irrespective of the combination of input data, an inference resultdifference to be generated between the inference model that uses networkA and the inference model that uses network B.

As described above, the information processing method according toEmbodiment 3 trains a second inference model using also the errorbetween an inference result for first data, which is obtained from afirst inference model (a first inference result) and an inference resultfor the first data, which is obtained from a second inference model (asecond inference result). Accordingly, since the second inference modelis trained not only to reduce the difference between projection resultsbut also to directly reduce the difference between an inference resultobtained using the first inference model and an inference resultobtained using the second inference model, it is possible to furtherreduce the inference result difference to be generated between these twoinference models.

The configuration according to Embodiment 3 may be applied to Embodiment2. In this case, a third inference model is trained using also the errorbetween an inference result for first data, which is obtained from afirst inference model (a first inference result) and an inference resultfor the first data, which is obtained from a second inference model (asecond inference result). By obtaining a new second inference model fromthe trained third inference model through a conversion process, a secondinference model is updated. Accordingly, it is possible to furtherreduce the difference between an inference result to be obtained usingthe first inference model and an inference result to be obtained usingthe second inference model, and thus further reduce the inference resultdifference to be generated between these two inference models.

Embodiment 4

Embodiment 4 describes an information processing method and aninformation processing system that are different from the informationprocessing method and the information processing system according toeach of Embodiments 1 through 3, and that reduce, irrespective of thecombination of input data, an inference result difference that may begenerated when obtaining a new inference model using an inference modelas an exemplar.

Hereinafter, an information processing system according to Embodiment 4configured by modifying part of information processing system 10Aaccording to Embodiment 1 will be described. Elements of the informationprocessing system according to Embodiment 4 that are same as thoseincluded in information processing system 10A according to Embodiment 1are already described and therefore assigned with like reference signs,and detailed description thereof is omitted. The following focuses onthe difference from information processing system 10A.

The present embodiment describes aspects different from those of theinformation processing method and the information processing systemaccording to each of Embodiments 1 through 3. Elements that aresubstantially same as those described in each of Embodiments 1 through 3are assigned with like reference signs, and detailed description thereofwill be omitted.

FIG. 12 is a block diagram illustrating the functional configuration ofinformation processing system 10D according to Embodiment 4. Informationprocessing system 10D is a system for obtaining a new inference modeltrained to output the same inference result as that obtained using anexisting inference model.

As illustrated in FIG. 12 , information processing system 10D includesfirst inference unit 11A, second inference unit 12A, output converter13A, space projector 14D, error calculator 15D, trainer 16A, trainingcontroller 17D, and first selector 51D.

Among the elements included in information processing system 10D, firstinference unit 11A, second inference unit 12A, output converter 13A, andtrainer 16A are the same as those included in information processingsystem 10A according to Embodiment 1. The following therefore describesspace projector 14D, error calculator 15D, training controller 17D, andfirst selector 51D in detail.

Space projector 14D has also the following functions in addition tofunctions that are same as those of space projector 14A according toEmbodiment 1. In other words, space projector 14D obtains a projectionprocess resulting from recording or changing performed by first selector51D. Space projector 14D then projects a conversion result obtained fromoutput converter 13A using the projection process obtained from firstselector 51D, and outputs the projection result regarding the conversionresult that has been input. Space projector 14D also notifies firstselector 51D of the projection process executed by space projector 14D.The details of the projection process recording and changing performedby first selector 51D will be described later.

Error calculator 15D has also the following function in addition tofunctions that are same as those of error calculator 15A according toEmbodiment 1. In other words, error calculator 15D outputs, to firstselector 51D, error information calculated based on a projection resultobtained from space projector 14D (also referred to as a first error).

First selector 51D changes a projection process to be executed by spaceprojector 14D so that a value indicated by error information to becalculated by error calculator 15D increases. Specifically, firstselector 51D records a combination of an executed projection process andcalculated error information, and changes the projection process basedon one or more recorded combinations. First selector 51D records acombination of a projection process to be executed by space projector14D and error information to be calculated by error calculator 15D,based on the result of comparing (i) the combination of the projectionprocess executed by space projector 14D and the error informationcalculated by error calculator 15D with (ii) each of one or morecombinations each being made up of a projection process and errorinformation which are placed in the record by first selector 51D.

Specifically, first, first selector 51D obtains a projection processexecuted by space projector 14D and a first error calculated by errorcalculator 15D. First selector 51D refers to one or more combinationseach being made up of a projection process and error informationrecorded by first selector 51D. When the combination of the obtainedprojection process and the obtained first error is not present among theone or more recorded combinations, that is, when the combination of theobtained projection process and the obtained first error is acombination of a projection process and error information obtained forthe first time by first selector 51D, first selector 51D records theprojection process and the first error. When the combination of theobtained projection process and the obtained first error is presentamong the one or more recorded combinations, first selector 51D comparesthe first error with error information in a recorded combinationmatching the combination. When the first error is greater than the errorinformation in the recorded combination, first selector 51D records theprojection process and the first error. When the first error is lessthan the error information in the recorded combination, first selector51D does not perform the recording process and keeps the projectionprocess and the error information in the recorded combination.

First selector 51D refers to a history of comparisons made betweencombinations, and when there is a projection process candidate that hasnot yet been compared, changes a projection process to be executed byspace projector 14D. When there is no such projection process candidate,first selector 51D changes the projection process to be executed byspace projector 14D to a projection process placed in the record byfirst selector 51D, and ends the projection process recording andchanging processes. The projection process may be, for example, aprocess of projecting input to an inner product space (projectionprocess A) or a process of reducing the number of dimensions of input(projection process B). The process of reducing the number of dimensionsof input may include principal component analysis. Projection processcandidates may include projection processes of different types such asprojection process A and projection process B, or projection processesof the same type each having process parameters different from any ofthe other projection processes.

The projection process recording and changing processes performed byfirst selector 51D may be ended based on a threshold value that ispreset. In that case, first selector 51D compares the first error andthe threshold value. When the first error is greater than the thresholdvalue, first selector 51D records the projection process and the firsterror, changes the projection process to be executed by space projector14D to the projection process recorded by first selector 51D, and endsthe projection process recording and changing processes. When the firsterror is less than the threshold value, first selector 51D refers to oneor more combinations each being made up of a projection process anderror information recorded by first selector 51D and repeats thesubsequent processes in the same manner, to change the projectionprocess to be executed by space projector 14D.

The projection process recording and changing processes performed byfirst selector 51D may be performed again based on network B updated bytrainer 16A. In that case, first selector 51D receives an instructionfrom training controller 17D and executes the processes described abovein the same manner, to change the projection process to be executed byspace projector 14D.

Training controller 17D has also the following functions in addition tofunctions that are same as those of training controller 17A according toEmbodiment 1. In other words, training controller 17D causes firstselector 51D to perform again the projection process changing processbased on network B updated by trainer 16A. For example, trainingcontroller 17D further trains an inference model that uses network B bycausing each of first inference unit 11A and second inference unit 12Ato input new input data and causing first inference unit 11A, secondinference unit 12A, output converter 13A, space projector 14D, errorcalculator 15D, first selector 51D, and trainer 16A to perform the aboveprocessing again, using network A, new network B, and new inputs.

FIG. 13 is a diagram illustrating a process of changing a projectionprocess in information processing system 10D according to Embodiment 4.

A process from when input data is input by first inference unit 11Auntil when an error is calculated by error calculator 15D is the same asthat included in the training performed in information processing system10A according to Embodiment 1.

After the error is calculated by error calculator 15D, first selector51D changes a projection process to be executed by space projector 14Dso that an error to be calculated by error calculator 15D increases.When the projection process to be executed by space projector 14D is aprojection process of reducing the number of dimensions of input of n-thdimension to obtain a projection result of m-th dimension (n>m), forexample, the process of changing the projection process includes aprocess of increasing or decreasing the dimension m of the projectionresult, which is a parameter, and a process of changing a combination ofprojection axes. The details of the procedure of the projection processchanging process performed by first selector 51D will be describedlater.

FIG. 14 is a diagram illustrating training conducted by second inferenceunit 12A in information processing system 10D according to Embodiment 4.

A process from when input data is input by first inference unit 11Auntil when network B is updated by trainer 16A is the same as thatincluded in the training performed in information processing system 10Aaccording to Embodiment 1.

A projection process to be executed by space projector 14D is a processresulting from the projection process changing performed by firstselector 51D. When first selector 51D changes the projection process tobe executed by space projector 14D to a projection process of reducingthe number of dimensions of input of n-th dimension to obtain aprojection result of k-th dimension (n>k), for example, space projector14D obtains input x=(x1, x2, . . . , xn) and outputs projection resultz=(z1, z2, . . . , zk).

Error calculator 15D calculates the error between projection resultsoutput by space projector 14D. The error is, for example, the norm(distance) between projection results in a projection space, and thenorm is calculated using, for example, a function utilizing the sum ofsquares error between sets of coordinates each indicating a differentone of the projection results. In other words, when first projectionresult z1=(z11, z12, . . . , z1k) and second projection result z2=(z21,z22, . . . , z2k) are output as the projection results output by spaceprojector 14D, error calculator 15D calculates the sum of squares errorbetween projection result z1 and projection result z2 using thefollowing Expression 3. An error calculation method is not limited tothe above example.

(z11−z21)²+(z12−z22)²+ . . . +(z1k−z2k)²  Expression 3

Trainer 16A adjusts coefficients included in network B to reduce anerror to be calculated by error calculator 15D. In this case, trainer16A refers to a loss function and adjusts the coefficients to reduce theerror through the coefficient adjustment. Trainer 16A thus updatesnetwork B by adjusting the coefficients in network B.

The following describes processing executed by information processingsystem 10D configured as described above.

FIG. 15 and FIG. 16 are each a flowchart illustrating processing (alsoreferred to as an information processing method) executed by informationprocessing system 10D according to Embodiment 4. FIG. 15 is a flowchartillustrating processes resulting from excluding the projection processchanging process from processes executed by information processingsystem 10D according to Embodiment 4. FIG. 16 is a flowchartillustrating the projection process changing process among the processesexecuted by information processing system 10D according to Embodiment 4.

The processes included in steps S101 through S109 in FIG. 15 are thesame processes as those performed by information processing system 10Aaccording to Embodiment 1 (see FIG. 3 , for instance).

In step S161, first selector 51D firstly determines whether to perform athreshold process to be described later. When determining not to performthe threshold process, first selector 51D performs a process in stepS163 to be described later. When determining to perform the thresholdprocess, first selector 51D determines whether error E1 calculated byerror calculator 15D in step S107 is greater than a threshold valuedetermined in advance. When error E1 is greater than the thresholdvalue, first selector 51D performs a process in step S162 to bedescribed later. When error E1 is less than the threshold value, firstselector 51D performs the process in step S163.

In step S162, first selector 51D records the combination of theprojection process executed by space projector 14D and error E1.

In step S163, first selector 51D determines whether the errorcalculation process performed by error calculator 15D in step S107 iserror calculation performed for the first time or whether error E1calculated by error calculator 15D in step S107 is greater than arecorded error. In other words, first selector 51D refers to one or morecombinations recorded by first selector 51D and when the combination ofthe projection process executed by space projector 14D and error E1 isnot in the record, determines that the error calculation process iserror calculation process performed for the first time and performs aprocess in step S162D to be described later. When the combination is inthe record and error E1 is greater than an error in a recordedcombination matching the combination, first selector 51D performs theprocess in step S162D. When error E1 is less than the error in therecorded combination, first selector 51D performs a process in step S164to be described later.

In step S162D, first selector 51D records the combination of theprojection process performed by space projector 14D and error E1.

In step S164, first selector 51D refers to a history of comparisons madebetween combinations, and determines whether there is any projectionprocess candidate that has not yet been compared. When such a projectionprocess candidate is present, first selector 51D performs a process instep S165 to be described later. When there is no such projectionprocess candidate, first selector 51D performs a process in step S105Dto be described later.

In step S165, first selector 51D changes the projection process to beexecuted by space projector 14D.

In step S105D, space projector 14D performs, on the first conversionresult obtained by output converter 13A in step S103, the projectionprocess in the combination recorded by first selector 51D in step S162or step S162D, and obtains a first projection result.

In step S106D, space projector 14D performs, on the second conversionresult obtained by output converter 13A in step S104, the projectionprocess in the combination recorded by first selector 51D in step S162or step S162D, and obtains a second projection result.

In step S107D, error calculator 15D calculates error E1 between thefirst projection result obtained by space projector 14D in step S105Dand the second projection result obtained by space projector 14D in stepS106D.

In step S166, training controller 17D determines whether first selector51D is to change again the projection process every time trainer 16Aupdates coefficients in network B. When first selector 51D changes againthe projection process based on updated network B, informationprocessing system 10D returns to the process in step S105. When firstselector 51D does not change again the projection process, informationprocessing system 10D returns to the process in step S105D and repeatsthe same sequence of processes as described above.

Through the sequence of the processes described above, informationprocessing system 10D changes a projection process so that error E1between a first projection result and a second projection resultincreases, and then trains an inference model that uses network B toreduce error E1 between a first projection result and a secondprojection result. As a result, it is possible to conduct training bymachine learning more smoothly than the case of not changing theprojection process. Stated differently, it is possible to inhibit thetraining from being retarded.

As described above, the information processing method according toEmbodiment 4 changes a projection process so that the error between afirst projection result and a second projection result (a first error)increases. Accordingly, it is possible to conduct training by machinelearning more smoothly than the case of not changing the projectionprocess. Stated differently, it is possible to inhibit the training frombeing retarded.

A first projection result and a second projection result are obtained byprojecting a first conversion result and a second conversion result to aspace where an inner product is defined. Accordingly, it is possible todefine the norm between the first projection result and the secondprojection result, thereby training a second inference model to, forexample, reduce the norm. As a result, the information processing methodcan reduce an inference result difference to be generated between twoinference models.

A first projection result and a second projection result are obtained byreducing the number of dimensions of a first conversion result and thenumber of dimensions of a second conversion result. Accordingly, it ispossible to, for example, firstly select a projection axis presentingthe difference between a first conversion result and a second conversionresult, subsequently perform a process of reducing the number ofdimensions other than the selected projection axis, and obtain the firstprojection result and the second projection result. As a result, theinformation processing method can further shorten a time required forcalculating the error between the first projection result and the secondprojection result. In addition, the information processing method caneffectively reduce an inference result difference to be generatedbetween two inference models.

A first projection result and a second projection result are obtained byperforming principal component analysis on a first conversion result anda second conversion result, and subsequently performing the process ofreducing the number of dimensions. Accordingly, it is possible toclarify the difference between the first projection result and thesecond projection result since one or more principal components otherthan at least one specific principal component are removed. A principalcomponent, which is likely to produce an error (distance) between thedistribution of first projection result and the distribution of secondprojection result that is greater than an error produced by any otherprincipal component, may be set for a specific principal component. As aresult, the information processing method can further shorten a timerequired for calculating the error between the first projection resultand the second projection result. In addition, the informationprocessing method can effectively reduce an inference result differenceto be generated between two inference models.

The configuration according to Embodiment 4 may be applied to Embodiment2 or Embodiment 3.

Embodiment 5

Embodiment 5 describes an information processing method and aninformation processing system that are different from the informationprocessing method and the information processing system according toeach of Embodiments 1 through 4, and that reduce, irrespective of thecombination of input data, an inference result difference that may begenerated when obtaining a new inference model using an inference modelas an exemplar.

Hereinafter, an information processing system according to Embodiment 5configured by modifying part of information processing system 10Aaccording to Embodiment 1 will be described.

FIG. 17 is a block diagram illustrating the functional configuration ofinformation processing system 10E according to Embodiment 5. Informationprocessing system 10E is a system for obtaining a new inference modeltrained to output the same inference result as that obtained using anexisting inference model.

As illustrated in FIG. 17 , information processing system 10E includesfirst inference unit 11A, second inference unit 12A, output converter13E, space projector 14A, error calculator 15E, trainer 16A, trainingcontroller 17E, and second selector 52E.

Among the elements included in information processing system 10E, firstinference unit 11A, second inference unit 12A, space projector 14A, andtrainer 16A are the same as those included in information processingsystem 10A according to Embodiment 1. The following therefore describesoutput converter 13E, error calculator 15E, training controller 17E, andsecond selector 52E in detail.

Output converter 13E has also the following functions in addition tofunctions that are same as those of output converter 13A according toEmbodiment 1. In other words, output converter 13E obtains a conversionprocess resulting from recording or changing performed by secondselector 52E. Output converter 13E then converts feature informationobtained from first inference unit 11A and feature information obtainedfrom second inference unit 12A, using the conversion process obtainedfrom second selector 52E, and outputs conversion results regarding thefeature information input. Output converter 13E also notifies secondselector 52E of the conversion process executed by output converter 13E.The details of the conversion process recording and changing performedby second selector 52E will be described later.

Error calculator 15E has also the following function in addition tofunctions that are same as those of error calculator 15A according toEmbodiment 1. In other words, error calculator 15E outputs, to secondselector 52E, error information calculated based on projection resultsobtained from space projector 14A (also referred to as a first error).

Second selector 52E changes a conversion process to be executed byoutput converter 13E so that a value indicated by error information tobe calculated by error calculator 15E increases. Specifically, secondselector 52E records a combination of an executed conversion process andcalculated error information, and changes the conversion process basedon one or more recorded combinations. Second selector 52E records acombination of a conversion process to be executed by output converter13E and error information to be calculated by error calculator 15E,based on the result of comparing (i) the combination of the conversionprocess executed by output converter 13E and the error informationcalculated by error calculator 15E with (ii) each of one or morecombinations each being made up of a conversion process and errorinformation which are placed in the record by second selector 52E.

Specifically, first, second selector 52E obtains a conversion processexecuted by output converter 13E and a first error calculated by errorcalculator 15E. Second selector 52E refers to one or more combinationseach being made up of a conversion process and error informationrecorded by second selector 52E. When the combination of the obtainedconversion process and the obtained first error is not present among theone or more recorded combinations, that is, when the combination of theobtained conversion process and the obtained first error is acombination of a conversion process and error information obtained forthe first time by second selector 52E, second selector 52E records theconversion process and the first error. When the combination of theobtained conversion process and the obtained first error is presentamong the one or more recorded combinations, second selector 52Ecompares the first error with error information in a recordedcombination matching the combination. When the first error is greaterthan the error information in the recorded combination, second selector52E records the conversion process and the first error. When the firsterror is less than the error information, second selector 52E does notperform the recording process and keeps the conversion process and theerror information in the recorded combination.

Second selector 52E refers to a history of comparisons made betweencombinations. When there is a conversion process candidate that has notyet been compared, second selector 52E changes a conversion process tobe executed by output converter 13E. When there is no such conversionprocess candidate, second selector 52E changes the conversion process tobe executed by output converter 13E to a conversion process placed inthe record by second selector 52E, and ends the conversion processrecording and changing processes. The conversion process may be, forexample, a process of performing scale conversion on an input.Conversion process candidates may include conversion processes ofdifferent types such as conversion process A and conversion process B,or conversion processes of the same type each having process parametersdifferent from those of any of the other conversion processes. Theconversion process may be a conversion process using a neural networkmodel and the process parameters may be neural network coefficients.

The conversion process recording and changing processes performed bysecond selector 52E may be ended based on a threshold value that ispreset. In that case, second selector 52E compares the first error andthe threshold value. When the first error is greater than the thresholdvalue, second selector 52E records the conversion process and the firsterror, changes the conversion process to be executed by output converter13E to the conversion process recorded by second selector 52E, and endsthe conversion process recording and changing processes. When the firsterror is less than the threshold value, second selector 52E refers toone or more combinations each being made up of a conversion process anderror information recorded by second selector 52E, and repeats thesubsequent processes in the same manner, to change the conversionprocess to be executed by output converter 13E.

The conversion process recording and changing processes performed bysecond selector 52E may be performed again based on network B updated bytrainer 16A. In that case, second selector 52E receives an instructionfrom training controller 17E and executes the processes described abovein the same manner, to change the conversion process.

Training controller 17E has also the following functions in addition tofunctions that are same as those of training controller 17A according toEmbodiment 1. In other words, training controller 17E causes secondselector 52E to perform again the conversion process changing processbased on network B updated by trainer 16A. For example, trainingcontroller 17E further trains an inference model that uses network B bycausing each of first inference unit 11A and second inference unit 12Ato input new input data and causing first inference unit 11A, secondinference unit 12A, output converter 13E, space projector 14A, errorcalculator 15E, second selector 52E, and trainer 16A to perform theabove processing again, using network A, new network B, and new inputs.

FIG. 18 is a diagram illustrating a process of changing a conversionprocess in information processing system 10E according to Embodiment 5.

A process from when input data is input by first inference unit 11Auntil when an error is calculated by error calculator 15E is the same asthat included in the training performed in information processing system10A according to Embodiment 1.

After the error is calculated by error calculator 15E, second selector52E changes a conversion process to be executed by output converter 13Eso that an error to be calculated by error calculator 15E increases.When the conversion process to be executed by output converter 13E isscale conversion f of changing the range of a value indicated by featureinformation, which is obtained using Equation 1, for example, theprocess of changing the conversion process includes a process ofchanging parameter a and a process of changing a function used for scaleconversion to scale conversion f2 obtained using the following Equation4. The details of the procedure of the conversion process changingprocess performed by second selector 52E will be described later.

f2(x)=a×tanh(x)  Equation 4

FIG. 19 is a diagram illustrating training conducted by second inferenceunit 12A in information processing system 10E according to Embodiment 5.

A process from when input data is input by first inference unit 11Auntil when network B is updated by trainer 16A is the same as thatincluded in the training performed in information processing system 10Aaccording to Embodiment 1.

A conversion process to be executed by output converter 13E is a processresulting from the conversion process changing performed by secondselector 52E. When second selector 52E changes the conversion process tobe executed by output converter 13E to scale conversion f2 obtainedusing Equation 4, for example, output converter 13E obtains input x andoutputs conversion result f2(x).

The subsequent processes after the conversion results are input to spaceprojector 14A are the same as those included in the training performedin information processing system 10A according to Embodiment 1.

The following describes processing executed by information processingsystem 10E configured as described above.

FIG. 20 and FIG. 21 are each a flowchart illustrating processing (alsoreferred to as an information processing method) executed by informationprocessing system 10E according to Embodiment 5. FIG. 20 is a flowchartillustrating processes resulting from excluding the conversion processchanging process from the processes executed by information processingsystem 10E according to Embodiment 5. FIG. 21 is a flowchartillustrating the conversion process changing process among the processesexecuted by information processing system 10E according to Embodiment 5.The processes included in steps S101 through S109 in FIG. 20 are thesame processes as those performed in information processing system 10Aaccording to Embodiment 1 (see FIG. 3 ).

In step S161E, second selector 52E firstly determines whether to performa threshold process to be described later. When determining not toperform the threshold process, second selector 52E performs a process instep S163E to be described later. When determining to perform thethreshold process, second selector 52E determines whether error E1calculated by error calculator 15E in step S107 is greater than athreshold value determined in advance. When error E1 is greater than thethreshold value, second selector 52E performs a process in step S181 tobe described later. When error E1 is less than the threshold value,second selector 52E performs the process in step S163E.

In step S181, second selector 52E records the combination of theconversion process executed by output converter 13E and error E1.

In step S163E, second selector 52E determines whether the errorcalculation process performed by error calculator 15E in step S107 iserror calculation performed for the first time or whether error E1calculated by error calculator 15E in step S107 is greater than arecorded error. In other words, second selector 52E refers to one ormore combinations recorded by second selector 52E and when thecombination of the conversion process performed by output converter 13Eand error E1 is not in the record, determines that the error calculationprocess is error calculation process performed for the first time andperforms a process in step S181E to be described later. When thecombination is in the record and error E1 is greater than an error in arecorded combination matching the combination, second selector 52Eperforms the process in step S181E. When error E1 is less than the errorin the recorded combination, second selector 52E performs a process instep S182 to be described later.

In step S181E, second selector 52E records the combination of theconversion process performed by output converter 13E and error E1.

In step S182, second selector 52E refers to a history of comparisonsmade between combinations, and determines whether there is anyconversion process candidate that has not yet been compared. When such aconversion process candidate is present, second selector 52E performs aprocess in step S183 to be described later. When there is no suchconversion process candidate, second selector 52E performs a process instep S103E to be described later.

In step S183, second selector 52E changes the conversion process to beexecuted by output converter 13E.

In step S103E, output converter 13E performs, on the first featureinformation obtained by first inference unit 11A in step S101, theconversion process in the combination recorded by second selector 52E instep S181 or step S181E, and obtains a first conversion result.

In step S104E, output converter 13E performs, on the second featureinformation obtained by second inference unit 12A in step S102, theconversion process in the combination recorded by second selector 52E instep S181 or step S181E, and obtains a second conversion result.

In step S105E, space projector 14A performs a projection process on thefirst conversion result obtained by output converter 13E in step S103E,and obtains a first projection result.

In step S106E, space projector 14A performs the projection process onthe second conversion result obtained by output converter 13E in stepS104E, and obtains a second projection result.

In step S107E, error calculator 15E calculates error E1 between thefirst projection result obtained by space projector 14A in step S105Eand the second projection result obtained by space projector 14A in stepS106E.

In step S102E, second inference unit 12A inputs input data to theinference model that uses network B, and obtains second featureinformation via network B.

In step S184, training controller 17E determines whether second selector52E is to change again the conversion process every time trainer 16Aupdates coefficients in network B. When second selector 52E changesagain the conversion process based on updated network B, informationprocessing system 10E returns to step S103. When second selector 52Edoes not change again the conversion process, the process returns tostep S103E and repeats the same sequence of processes as describedabove,

Through the sequence of the processes described above, informationprocessing system 10E changes a conversion process so that error E1between a first projection result and a second projection resultincreases, and then trains an inference model that uses network B toreduce error E1 between the first projection result and the secondprojection result. As a result, it is possible to conduct training bymachine learning more smoothly than the case of not changing theconversion process. Stated differently, it is possible to inhibit thetraining from being retarded.

As described above, the information processing method according toEmbodiment 5 changes a conversion process so that the error between afirst projection result and a second projection result (a first error)increases. Accordingly, it is possible to conduct training by machinelearning more smoothly than the case of not changing the conversionprocess. Stated differently, it is possible to inhibit the training frombeing retarded.

The configuration according to Embodiment 5 may be applied to Embodiment2 or Embodiment 3.

Embodiment 6

Embodiment 6 describes an information processing method and aninformation processing system that are different from the informationprocessing method and the information processing system according toeach of Embodiments 1 through 5, and that reduce, irrespective of thecombination of input data, an inference result difference that may begenerated when obtaining a new inference model using an inference modelas an exemplar.

Hereinafter, an information processing system according to Embodiment 6configured by modifying part of information processing system 10Aaccording to Embodiment 1 will be described.

FIG. 22 is a block diagram illustrating the functional configuration ofinformation processing system 10F according to Embodiment 6. Informationprocessing system 10F is a system for obtaining a new inference modeltrained to output the same inference result as that obtained using anexisting inference model.

As illustrated in FIG. 22 , information processing system 10F includesfirst inference unit 11A, second inference unit 12A, output converter13F, space projector 14F, error calculator 15F, trainer 16A, trainingcontroller 17F, and third selector 53F.

Among the elements included in information processing system 10F, firstinference unit 11A, second inference unit 12A, and trainer 16A are thesame as those included in information processing system 10A according toEmbodiment 1. The following therefore describes output converter 13F,space projector 14F, error calculator 15F, training controller 17F, andthird selector 53F in detail.

Output converter 13F has also the following functions in addition tofunctions that are same as those of output converter 13A according toEmbodiment 1. In other words, output converter 13F obtains a conversionprocess resulting from recording or changing performed by third selector53F. Output converter 13F then converts feature information obtainedfrom first inference unit 11A and feature information obtained fromsecond inference unit 12A, using the conversion process obtained fromthird selector 53F, and outputs conversion results regarding the featureinformation input. Output converter 13F also notifies third selector 53Fof the conversion process executed by output converter 13F. The detailsof the conversion process recording and changing performed by thirdselector 53F will be described later.

Space projector 14F has also the following functions in addition tofunctions that are same as those of space projector 14A according toEmbodiment 1. In other words, space projector 14F obtains a projectionprocess resulting from recording or changing performed by third selector53F. Space projector 14F then projects conversion results obtained fromoutput converter 13F, using the projection process obtained from thirdselector 53F, and outputs projection results regarding the conversionresults that have been input. Space projector 14F also notifies thirdselector 53F of the projection process executed by space projector 14F.The details of the projection process recording and changing performedby third selector 53F will be described later.

Error calculator 15F has also the following function in addition tofunctions that are same as those of error calculator 15A according toEmbodiment 1. In other words, error calculator 15F outputs, to thirdselector 53F, error information calculated based on projection resultsobtained from space projector 14F (also referred to as a first error).

Third selector 53F changes a conversion process to be executed by outputconverter 13F and a projection process to be executed by space projector14F so that a value indicated by error information to be calculated byerror calculator 15F increases. Specifically, third selector 53F recordsa combination of an executed conversion process, an executed projectionprocess, and calculated error information, and changes the conversionprocess and the projection process based on one or more recordedcombinations. Third selector 53F records a combination of a conversionprocess to be executed by output converter 13F, a projection process tobe executed by space projector 14F, and error information to becalculated by error calculator 15F based on the result of comparing (i)the combination of the conversion process executed by output converter13F, the projection process executed by space projector 14F, and theerror information calculated by error calculator 15F, and (ii) each ofone or more combinations each being made up of a conversion process, aprojection process, and error information which are placed in the recordby third selector 53F.

Specifically, first, third selector 53F obtains a conversion processexecuted by output converter 13F, a projection process executed by spaceprojector 14F, and a first error calculated by error calculator 15F.Third selector 53F refers to one or more combinations each being made upof a conversion process, a projection process, and error informationrecorded by third selector 53F. When the combination of the obtainedconversion process, the obtained projection process, and the obtainedfirst error is not present among the one or more recorded combinations,that is, when the combination of the obtained conversion process, theobtained projection process, and the obtained first error is acombination of a conversion process, a projection process, and errorinformation obtained for the first time by third selector 53F, thirdselector 53F records the conversion process, the projection process, andthe first error. When the combination of the obtained conversionprocess, the obtained projection process, and the obtained first erroris present among the one or more recorded combinations, third selector53F compares the first error with error information in a recordedcombination matching the combination. When the first error is greaterthan the error information in the recorded combination, third selector53F records the conversion process, the projection process, and thefirst error. When the first error is less than the error information inthe recorded combination, third selector 53F does not perform therecording process and keeps the conversion process, the projectionprocess, and the error information in the recorded combination.

Third selector 53F refers to a history of comparisons made betweencombinations. When there is a candidate for the combination of aconversion process and a projection process which has not yet beencompared, third selector 53F changes a conversion process to be executedby output converter 13F and a projection process to be executed by spaceprojector 14F. When there is no such candidate, third selector 53Fchanges the conversion process to be executed by output converter 13Fand the projection process to be executed by space projector 14F to aconversion process and a projection process which are placed in therecord by third selector 53F, respectively, and ends the changing andrecording processes of the combination of a conversion process and aprojection process. The conversion process is the same as that describedin Embodiment 5 and the projection process is the same as that describedin Embodiment 4.

The recording and changing processes of the combination of a conversionprocess and a projection process, which are performed by third selector53F, may be ended based on a threshold value that is preset. In thatcase, third selector 53F compares the first error and the thresholdvalue. When the first error is greater than the threshold value, thirdselector 53F records the combination of the conversion process, theprojection process, and the first error, changes the conversion processto be executed by output converter 13F to the conversion processrecorded by third selector 53F, changes the projection process to beexecuted by space projector 14F to the projection process recorded bythird selector 53F, and ends the recording and changing processes of thecombination of a conversion process and a projection process. When thefirst error is less than the threshold value, third selector 53F refersto one or more combinations each being made up of a conversion process,a projection process, and error information recorded by third selector53F, and repeats the subsequent processes in the same manner, to changethe conversion process and the projection process.

The recording and changing processes of the combination of a conversionprocess and a projection process, which are performed by third selector53F, may be performed again based on network B updated by trainer 16A.In that case, third selector 53F receives an instruction from trainingcontroller 17F and executes the processes described above in the samemanner, to change the conversion process and the projection process.

In the process of changing the combination of a conversion process and aprojection process, which is performed by third selector 53F, theprocess of changing the conversion process and the process of changingthe projection process may be performed in order or at the same time. Inother words, in the case where the process of changing the conversionprocess and the process of changing the projection process are performedin order by third selector 53F, (i) the conversion process is changed sothat the first error increases, and then based on the changed conversionprocess, the projection process is changed so that the first errorincreases, or (ii) the projection process is changed so that the firsterror increases, and then based on the changed projection process, theconversion process is changed so that the first error increases. In thecase where the process of changing the conversion process and theprocess of changing the projection process are performed at the sametime by third selector 53F, the combination of a conversion process anda projection process is changed so that the first error increases. Awell-known technique such as Bayesian optimization may be employed forthe changing method in this case.

Training controller 17F has also the following function in addition tofunctions that are same as those of training controller 17A according toEmbodiment 1. In other words, based on network B updated by trainer 16A,training controller 17F causes third selector 53F to perform again theprocess of changing the combination of a conversion process and aprojection process. For example, training controller 17F causes each offirst inference unit 11A and second inference unit 12A to input newinput data and causes first inference unit 11A, second inference unit12A, output converter 13F, space projector 14F, error calculator 15F,third selector 53F, and trainer 16A to execute again the aboveprocesses, to further train an inference model that uses network B.

FIG. 23 is a diagram illustrating a process of changing the combinationof a conversion process and a projection process in informationprocessing system 10F according to Embodiment 6.

A process from when input data is input by first inference unit 11Auntil when an error is calculated by error calculator 15F is the same asthat included in the training performed in information processing system10A according to Embodiment 1.

After the error is calculated by error calculator 15F, third selector53F changes a conversion process to be executed by output converter 13Fand a projection process to be executed by space projector 14F, so thatan error to be calculated by error calculator 15F increases. The detailsof the changing of the conversion process are the same as thosedescribed in Embodiment 5. The details of the changing of the projectionprocess are the same as those described in Embodiment 4.

FIG. 24 is a diagram illustrating training conducted by second inferenceunit 12A in information processing system 10F according to Embodiment 6.

A process from when input data is input by inference unit 11A until whennetwork B is updated by trainer 16A is the same as that included in thetraining performed in information processing system 10A according toEmbodiment 1.

A conversion process to be executed by output converter 13F is a processresulting from the conversion process changing process performed bythird selector 53F. The details of a process performed by outputconverter 13F when third selector 53F changes the conversion process isthe same as that performed in information processing system 10Eaccording to Embodiment 5.

A projection process to be executed by space projector 14F is a processresulting from the projection process changing process performed bythird selector 53F. The details of a process performed by spaceprojector 14F when third selector 53F changes the projection process isthe same as that performed in information processing system 10Daccording to Embodiment 4.

The subsequent processes after projection results are input to errorcalculator 15F are the same as those included in the training performedin information processing system 10A according to Embodiment 1.

The following describes processing executed by information processingsystem 10F configured as described above.

FIG. 25 and FIG. 26 are each a flowchart illustrating processing (alsoreferred to as an information processing method) executed by informationprocessing system 10F according to Embodiment 6. FIG. 25 is a flowchartillustrating processes resulting from excluding the process of changingthe combination of a conversion process and a projection process fromprocesses executed by information processing system 10F according toEmbodiment 6. FIG. 26 is a flowchart illustrating the process ofchanging the combination of a conversion process and a projectionprocess among the processes executed by information processing system10F according to Embodiment 6.

The processes included in steps S101 through S109 in FIG. 25 are thesame processes as those performed in information processing system 10Aaccording to Embodiment 1 (see FIG. 3 ).

In step S161F, third selector 53F firstly determines whether to performa threshold process to be described later. When determining not toperform the threshold process, third selector 53F performs a process instep S163F to be described later. When determining to perform thethreshold process, third selector 53F determines whether error E1calculated by error calculator 15F in step S107 is greater than athreshold value determined in advance. When error E1 is greater than thethreshold value, third selector 53F performs a process in step S191 tobe described later. When error E1 is less than the threshold value,third selector 53F performs the process in step S163F.

In step S191, third selector 53F records the combination of theconversion process executed by output converter 13F, the projectionprocess executed by space projector 14F, and error E1.

In step S163F, third selector 53F determines whether the errorcalculation process performed by error calculator 15F in step S107 iserror calculation performed for the first time or whether error E1calculated by error calculator 15F in step S107 is greater than arecorded error. In other words, third selector 53F refers to one or morecombinations recorded by third selector 53F and when the combination ofthe conversion process executed by output converter 13F, the projectionprocess executed by space projector 14F, and error E1 is not in therecord, determines that the error calculation process is errorcalculation performed for the first time and performs a process in stepS191F to be described later. When the combination is in the record anderror E1 is greater than an error in a recorded combination matching thecombination, third selector 53F performs the process in step S191F. Whenerror E1 is less than the error in the recorded combination, thirdselector 53F performs a process in step S192 to be described later.

In step S191F, third selector 53F records the combination of theconversion process executed by output converter 13F, the projectionprocess executed by space projector 14F, and error E1.

In step S192, third selector 53F refers to a history of comparisons madebetween combinations, and determines whether there is any candidate forthe combination of a conversion process and a projection process whichhas not yet been compared. When there is such a candidate, thirdselector 53F performs a process in step S193 to be described later. Whenthere is no such candidate, third selector 53F performs a process instep S103F to be described later.

In step S193, third selector 53F changes the conversion process to beexecuted by output converter 13F and the projection process to beexecuted by space projector 14F.

In step S103F, output converter 13F performs, on the first featureinformation obtained by first inference unit 11A in step S101, theconversion process in the combination recorded by third selector 53F instep S191 or step S191F, and obtains a first conversion result.

In step S104F, output converter 13F performs, on the second featureinformation obtained by second inference unit 12A in step S102, theconversion process in the combination recorded by third selector 53F instep S191 or step S191F, and obtains a second conversion result.

In step S105F, space projector 14F performs, on the first conversionresult obtained by output converter 13F in step S103F, the projectionprocess in the combination recorded by third selector 53F in step S191or step S191F, and obtains a first projection result.

In step S106F, space projector 14F performs, on the second conversionresult obtained by output converter 13F in step S104F, the projectionprocess in the combination recorded by third selector 53F in step S191or step S191F, and obtains a second projection result.

In step S107F, error calculator 15F calculates error E1 between thefirst projection result obtained by space projector 14F in step S105Fand the second projection result obtained by space projector 14F in stepS106F.

In step S102F, second inference unit 12A inputs input data to theinference model that uses network B, and obtains second featureinformation via network B.

In step S194, training controller 17F determines whether third selector53F is to change again the combination of a conversion process and aprojection process every time trainer 16A updates coefficients innetwork B. When third selector 53F changes again the combination of aconversion process and a projection process based on updated network B,information processing system 10F returns to step S103. When thirdselector 53F does not change again the combination of a conversionprocess and a projection process, information processing system 10Freturns to step S103F and repeats the same sequence of processes asdescribed above.

Through the sequence of the processes described above, informationprocessing system 10F changes the combination of a conversion processand a projection process so that error E1 between a first projectionresult and a second projection result increases, and then trains aninference model that uses network B to reduce error E1 between the firstprojection result and the second projection result. As a result, it ispossible to conduct training by machine learning more smoothly than thecase of not changing at least one of the conversion process or theprojection process. Stated differently, it is possible to inhibit thetraining from being retarded.

As described above, the information processing method according toEmbodiment 6 changes the combination of a conversion process and aprojection process so that the error between a first projection resultand a second projection result (a first error) increases. Accordingly,it is possible to conduct training by machine learning more smoothlythan the case of not changing at least one of the conversion process orthe projection process. Stated differently, it is possible to inhibitthe training from being retarded.

The configuration according to the present embodiment may be applied toEmbodiment 2 or Embodiment 3.

Variation

A process of determining whether a conversion process is necessary ornot may be performed in the processing executed by each of informationprocessing system 10A through 10F according to Embodiments 1 through 6.The following describes a variation of the case of determining whether aconversion process is necessary. Although the following describes, as anexample, a process executed by information processing system 10Aaccording to Embodiment 1, the same process is performed also in thecase where any one of information processing systems 10B through 10Faccording to Embodiment 2 through 6 performs the process.

FIG. 27 is a flowchart illustrating processing executed by aninformation processing system according to a variation.

The processes included in steps S101 through S109 in FIG. 27 are thesame processes as those performed in information processing system 10Aaccording to Embodiment 1 (see FIG. 3 ).

In step S110, error calculator 15A determines whether or not to performdetermination of whether a conversion process is necessary. When thedetermination is not performed, the process in step S108 is performed.When the determination is performed, a process in step S111 to bedescribed later is performed.

In step S111, space projector 14A performs a projection process on thefirst feature information obtained by first inference unit 11A vianetwork A in step S101, and obtains a first non-conversion projectionresult.

In step S112, space projector 14A performs the projection process on thesecond feature information obtained by second inference unit 12A vianetwork B in step S102, and obtains a second non-conversion projectionresult.

In step S113, error calculator 15A calculates error E0 between the firstnon-conversion projection result obtained by space projector 14A in stepS111 and the second non-conversion projection result obtained by spaceprojector 14A in step S112.

In step S114, error calculator 15A determines whether error E1calculated by error calculator 15A in step S107 is greater than error E0calculated by error calculator 15A in step S113. When error E1 isgreater than error E0, error calculator 15A determines that a conversionprocess is necessary and the process in step S108 is performed. Whenerror E1 is less than error E0, error calculator 15A determines that aconversion process is unnecessary, and a process in step S108A to bedescribed later is performed.

In the process in step S114, when error E0 is greater than apredetermined threshold value, error calculator 15A may determine that aconversion process is necessary, and when error E0 is less than thepredetermined threshold value, error calculator 15A may determine that aconversion process is unnecessary.

In step S108A, trainer 16A updates coefficients in network B using errorE0 calculated in step S113 to reduce error E0, and performs the processin step S109.

(Supplementary Information)

Each of the functions included in each of information processing system10A through information processing system 10F can be realized by apredetermined program being executed by a processor (e.g., a CPU) (notshown in the drawings).

Each of the elements in each of Embodiments 1 through 6 and thevariation may be configured in the form of an exclusive hardwareproduct, or may be realized by executing a software program suitable forthe element. Each of the elements may be realized by a program executingunit such as a CPU or a processor reading and executing the softwareprogram recorded on a recording medium such as a hard disk or asemiconductor memory. Here, the software program for realizing theinformation processing device according to each of the embodiments andthe variation, for instance, is a program described below.

The program causes a computer to execute an information processingmethod that is executed by a processor and includes: inputting firstdata to a first inference model to obtain first feature information;inputting the first data to a second inference model to obtain secondfeature information; performing a conversion process on the firstfeature information to obtain a first conversion result; performing theconversion process on the second feature information to obtain a secondconversion result; performing a projection process on the firstconversion result to obtain a first projection result; performing theprojection process on the second conversion result to obtain a secondprojection result; obtaining a first error indicating an error betweenthe first projection result and the second projection result; andtraining the second inference model by machine learning to reduce thefirst error. The conversion process produces an error between the firstprojection result and the second projection result that is greater thanan error between a first non-conversion projection result and a secondnon-conversion projection result, where the first non-conversionprojection result is obtained by performing the projection process onthe first feature information, and the second non-conversion projectionresult is obtained by performing the projection process on the secondfeature information.

The program causes a computer to execute an information processingmethod that is executed by a processor and includes: inputting firstdata to a first inference model to obtain first feature information;inputting the first data to a second inference model to obtain secondfeature information; performing a conversion process on the firstfeature information to obtain a first conversion result; performing theconversion process on the second feature information to obtain a secondconversion result; performing a projection process on the firstconversion result to obtain a first projection result; performing theprojection process on the second conversion result to obtain a secondprojection result; obtaining a first error indicating an error betweenthe first projection result and the second projection result; training athird inference model by machine learning to reduce the first error; andperforming a model conversion process of converting the trained thirdinference model, to update the second inference model. The conversionprocess produces an error between the first projection result and thesecond projection result that is greater than an error between a firstnon-conversion projection result and a second non-conversion projectionresult, where the first non-conversion projection result is obtained byperforming the projection process on the first feature information, andthe second non-conversion projection result is obtained by performingthe projection process on the second feature information.

These programs are recorded on, for example, a computer-readablenon-transitory recording medium.

As described above, examples of techniques disclosed in the presentapplication have been described based on embodiments and a variation.The present disclosure, however, is not limited to these embodiments andvariation. Various modifications to the embodiments and variation whichmay be conceived by those skilled in the art, as well as embodimentsresulting from combinations of elements from different embodiments andvariation are included within the scope of the present disclosure solong as they do not depart from the essence of the present disclosure.

INDUSTRIAL APPLICABILITY

The present disclosure is applicable to a system that generates a newinference model using an existing inference model as an exemplar.

1. An information processing method executed by a processor, theinformation processing method comprising: inputting first data to afirst inference model to obtain first feature information; inputting thefirst data to a second inference model to obtain second featureinformation; performing a conversion process on the first featureinformation to obtain a first conversion result; performing theconversion process on the second feature information to obtain a secondconversion result; performing a projection process on the firstconversion result to obtain a first projection result; performing theprojection process on the second conversion result to obtain a secondprojection result; obtaining a first error indicating an error betweenthe first projection result and the second projection result; andtraining the second inference model by machine learning to reduce thefirst error, wherein the conversion process produces an error betweenthe first projection result and the second projection result that isgreater than an error between a first non-conversion projection resultand a second non-conversion projection result, the first non-conversionprojection result being obtained by performing the projection process onthe first feature information, the second non-conversion projectionresult being obtained by performing the projection process on the secondfeature information.
 2. The information processing method according toclaim 1, wherein in the training of the second inference model, thesecond inference model is trained by machine learning using also asecond error indicating a difference between a first inference resultand a second inference result, the first inference result beingadditionally obtained by inputting the first data to the first inferencemodel, the second inference result being additionally obtained byinputting the first data to the second inference model.
 3. Theinformation processing method according to claim 1, further comprising:changing the projection process to increase the first error.
 4. Theinformation processing method according to claim 1, further comprising:changing the conversion process to increase the first error.
 5. Theinformation processing method according to claim 1, further comprising:changing a combination of the conversion process and the projectionprocess to increase the first error.
 6. The information processingmethod according to claim 1, wherein the conversion process includes aprocess of performing scale conversion on an input.
 7. The informationprocessing method according to claim 1, wherein the projection processincludes a process of projecting input to an inner product space.
 8. Theinformation processing method according to claim 1, wherein theprojection process includes a process of reducing a total number ofdimensions of input.
 9. The information processing method according toclaim 8, wherein the process of reducing the total number of dimensionsincludes principal component analysis.
 10. The information processingmethod according to claim 1, wherein the first data is image data. 11.An information processing method executed by a processor, theinformation processing method comprising: inputting first data to afirst inference model to obtain first feature information; inputting thefirst data to a second inference model to obtain second featureinformation; performing a conversion process on the first featureinformation to obtain a first conversion result; performing theconversion process on the second feature information to obtain a secondconversion result; performing a projection process on the firstconversion result to obtain a first projection result; performing theprojection process on the second conversion result to obtain a secondprojection result; obtaining a first error indicating an error betweenthe first projection result and the second projection result; training athird inference model by machine learning to reduce the first error; andperforming a model conversion process of converting the trained thirdinference model, to update the second inference model, wherein theconversion process produces an error between the first projection resultand the second projection result that is greater than an error between afirst non-conversion projection result and a second non-conversionprojection result, the first non-conversion projection result beingobtained by performing the projection process on the first featureinformation, the second non-conversion projection result being obtainedby performing the projection process on the second feature information.12. The information processing method according to claim 11, wherein inthe training of the third inference model, the third inference model istrained by machine learning using also a second error indicating adifference between a first inference result and a second inferenceresult, the first inference result being additionally obtained byinputting the first data to the first inference model, the secondinference result being additionally obtained by inputting the first datato the second inference model.
 13. The information processing methodaccording to claim 11, further comprising: changing the projectionprocess to increase the first error.
 14. The information processingmethod according to claim 11, further comprising: changing theconversion process to increase the first error.
 15. The informationprocessing method according to claim 11, further comprising: changing acombination of the conversion process and the projection process toincrease the first error.
 16. The information processing methodaccording to claim 11, wherein the first inference model, the secondinference model, and the third inference model are each a neural networkmodel, and the model conversion process includes a process ofcompressing the neural network model.
 17. The information processingmethod according to claim 16, wherein the process of compressing theneural network model includes a process of quantizing the neural networkmodel.
 18. The information processing method according to claim 17,wherein the process of quantizing the neural network model includes aprocess of converting a coefficient in the neural network model from afloating-point format to a fixed-point format.
 19. The informationprocessing method according to claim 16, wherein the process ofcompressing the neural network model includes a process of reducing atotal number of nodes in the neural network model or a process ofremoving a connection between nodes in the neural network model.
 20. Theinformation processing method according to claim 11, wherein theconversion process includes a process of performing scale conversion onan input.
 21. The information processing method according to claim 11,wherein the projection process includes a process of projecting input toan inner product space.
 22. The information processing method accordingto claim 11, wherein the projection process includes a process ofreducing a total number of dimensions of input.
 23. The informationprocessing method according to claim 22, wherein the process of reducingthe total number of dimensions includes principal component analysis.24. The information processing method according to claim 11, wherein thefirst data is image data.
 25. An information processing systemcomprising: an obtainer that obtains second data; and an inference unitthat inputs the second data obtained by the obtainer to a secondinference model, and obtains and outputs a second inference result,wherein the second inference model is a model obtained by executing aninformation processing method, the information processing method beingexecuted by a processor and including: inputting first data to a firstinference model to obtain first feature information; inputting the firstdata to a second inference model to obtain second feature information;performing a conversion process on the first feature information toobtain a first conversion result; performing the conversion process onthe second feature information to obtain a second conversion result;performing a projection process on the first conversion result to obtaina first projection result; performing the projection process on thesecond conversion result to obtain a second projection result; obtaininga first error indicating an error between the first projection resultand the second projection result; and training the second inferencemodel by machine learning to reduce the first error, and the conversionprocess produces an error between the first projection result and thesecond projection result that is greater than an error between a firstnon-conversion projection result and a second non-conversion projectionresult, the first non-conversion projection result being obtained byperforming the projection process on the first feature information, thesecond non-conversion projection result being obtained by performing theprojection process on the second feature information.
 26. An informationprocessing system comprising: an obtainer that obtains second data; andan inference unit that inputs the second data obtained by the obtainerto a second inference model, and obtains and outputs a second inferenceresult, wherein the second inference model is a model obtained byexecuting an information processing method, the information processingmethod being executed by a processor and including: inputting first datato a first inference model to obtain first feature information;inputting the first data to a second inference model to obtain secondfeature information; performing a conversion process on the firstfeature information to obtain a first conversion result; performing theconversion process on the second feature information to obtain a secondconversion result; performing a projection process on the firstconversion result to obtain a first projection result; performing theprojection process on the second conversion result to obtain a secondprojection result; obtaining a first error indicating an error betweenthe first projection result and the second projection result; training athird inference model by machine learning to reduce the first error; andperforming a model conversion process of converting the trained thirdinference model, to update the second inference model, and theconversion process produces an error between the first projection resultand the second projection result that is greater than an error between afirst non-conversion projection result and a second non-conversionprojection result, the first non-conversion projection result beingobtained by performing the projection process on the first featureinformation, the second non-conversion projection result being obtainedby performing the projection process on the second feature information.